.
In Python, String is an object consisting of a sequence of immutable characters. String processing is fundamental in Textual Analysis. This post is extracted from NLTK - Strings: Text Processing at the Lowest Level.
(1) String Literal
String literal (a sequence of zero or more characters enclosed within certain markers) are written in a variety of ways:
- Single quotes: 'allows embedded "double" quotes'
- Double quotes: "allows embedded 'single' quotes"
- Triple quoted: '''Three single quotes''', """Three double quotes"""
quote1 = "You don't have to be happy to \"smile\"" quote2 = 'You don\'t have to be happy to "smile"' quote3 = '''You don't have to be happy to "smile"''' print(quote1) print(quote2) print(quote3)
(2) Accessing Sub-String
Individual characters of a String can be accessed by using the method of indexing which accepts positive (moving forward from left to right) and negative numbers (moving backward from right to left). Be careful to select numbers within valid range of values.
keyword="chillax" # print all print(keyword[:]) # print from index 0 to before 3 print(keyword[:3]) # print from index 4 to end print(keyword[4:]) # print from index 0 to before last item print(keyword[:-1]) # print in reverse direction print(keyword[::-1])
In the above example, selecting parts of the whole string is called slicing.
(3) Formatting string output
Strings in Python can be formatted with the use of format() method which will replace the symbol {} in a string with its parameter values.
keyword="Chillax" print ("Hello {}".format(keyword))
Python 3 introduces f-string syntax which simplify the print statement as shown below.
keyword="Chillax" print(f"Hello {keyword}")
Both statements above produce the same output.
Hello Chillax
(4) String elements are Immutable
String elements are immutable i.e. cannot be changed after being declared. To get around this, redeclare the string.
keyword="chillax" # valid statement keyword="chillux" # invalid statement # TypeError: 'str' object does not support item assignment keyword[5]="u"
(5) Replacing string characters/substring
Python provides replace method that helps to replace characters/substring in a string.
textBefore = "Chillax"
textAfter = textBefore.replace("a", "u" )
print(textBefore)
print(textAfter)
Chillax
Chillux
Chillux
(6) Manipulating string using Regular Expression (Regex) pattern
Python provides a more powerful regex sub() function which replaces substrings that matches a specified pattern. Regex is useful during the preprocessing activities.
import re oldWord = 'Chillax' pattern = r'ax' replacement = r'er' newWord = re.sub(pattern, replacement, oldWord) print(newWord)
Chiller
.
0 Comments