[what] String Manipulation For Textual Analysis In Python

  


.

In Python, String is an object consisting of a sequence of immutable characters. String processing is fundamental in Textual Analysis. This post is extracted from NLTK - Strings: Text Processing at the Lowest Level.


(1) String Literal

String literal (a sequence of zero or more characters enclosed within certain markers) are written in a variety of ways:

  • Single quotes: 'allows embedded "double" quotes'
  • Double quotes: "allows embedded 'single' quotes"
  • Triple quoted: '''Three single quotes''', """Three double quotes"""
quote1 = "You don't have to be happy to \"smile\""	
quote2 = 'You don\'t have to be happy to "smile"'
quote3 = '''You
don't 
have to
be happy
to "smile"'''
print(quote1)
print(quote2)
print(quote3)

(2) Accessing Sub-String

Individual characters of a String can be accessed by using the method of indexing which accepts positive (moving forward from left to right) and negative numbers (moving backward from right to left). Be careful to select numbers within valid range of values.
 

keyword="chillax"
# print all
print(keyword[:])
# print from index 0 to before 3
print(keyword[:3])
# print from index 4 to end
print(keyword[4:])
# print from index 0 to before last item
print(keyword[:-1])
# print in reverse direction
print(keyword[::-1])

In the above example, selecting parts of the whole string is called slicing.


(3) Formatting string output

Strings in Python can be formatted with the use of format() method which will replace the symbol {} in a string with its parameter values.
keyword="Chillax"
print ("Hello {}".format(keyword))

Python 3 introduces f-string syntax which simplify the print statement as shown below.

keyword="Chillax"
print(f"Hello {keyword}")

Both statements above produce the same output.


Hello Chillax

 

(4) String elements are Immutable

String elements are immutable i.e. cannot be changed after being declared. To get around this, redeclare the string.
keyword="chillax"
# valid statement
keyword="chillux"
# invalid statement
# TypeError: 'str' object does not support item assignment
keyword[5]="u"

 

(5) Replacing string characters/substring

Python provides replace method that helps to replace characters/substring in a string.
textBefore = "Chillax"

textAfter = textBefore.replace("a", "u" )
 
print(textBefore)
print(textAfter)
Chillax
Chillux

(6) Manipulating string using Regular Expression (Regex) pattern

Python provides a more powerful regex sub() function which replaces substrings that matches a specified pattern. Regex is useful during the preprocessing activities.
import re
oldWord = 'Chillax'
pattern = r'ax'
replacement = r'er'
newWord = re.sub(pattern, replacement, oldWord)
print(newWord)


Chiller

.

(7) Further Reading





Post a Comment

0 Comments