Python Fundamentals – String Manipulation

Python Fundamentals – String Manipulation

WHAT IS A STRING?

A string is a sequence of characters. Strings are basically just a bunch of words. Strings are the form of data used in programming for storing and manipulating text, such as words, names and sentences. You can write them in Python using quotes, double quotes, or triple quotes.           .

Single Quote
You can specify .strings using single quotes such as ‘Quote me on this’. All white space i.e. spaces and tabs are preserved as it is.

Double Quotes
Strings in double quotes work exactly the same way as strings in single quotes.
An example is “What’s your name?”.

Triple Quotes
You can specify multi-line strings using tri­ple quotes – (“””). You can use single quotes and double quotes freely within the triple quotes. An example is:
“‘This is a multi-line string. This is the first line.
This is the second line.”What’s your name?,” I asked.
He said “Ram, Raja Ram.”
Docstrings
Python has a feature called documentation strings, usually referred to by its shorter name docstrings. Docstrings are an important tool that you should make use of since it helps to document the program better and makes it easier to understand.

LOGICAL AND PHYSICAL LINE

A physical line is what you see when you write the program. A logical line is what Python sees as a single statement. Python implicitly assumes that each physical line cor­responds to a logical line. An example of a logical line is a statement like print(“Hello World”) – if this was on a line by itself (as you see it in an editor), then this also corresponds to a physical line.
Implicitly, Python encourages the use of a single statement per line which makes code more readable.
If you have a long line of code, you can break it into multiple physical lines by using the backslash. This is referred to as explicit line joining:
For example,

This gives the output:

WORKING WITH STRINGS

String is an ordered sequence of letters/characters. They are enclosed in single quotes (“) or double (” “). The quotes are not part of string. They only tell the computer where the string constant begins and ends. They can have any character or sign, including space in them.
‘Hello, World!’ is a string, so-called because it contains a ‘string’ of letters. You (and the interpreter) can identify strings because they are enclosed in quotation marks. If you are not sure what type a value has, the interpreter can tell you.                                                                         t

It is possible to change one type of value/variable to another type. It is known as type conversion or type casting. The conversion can be done explicitly (programmer specifies the conversions) or implicitly (Interpreter automatically converts the data type).
An empty string contains no characters and has length 0, represented by two quotation marks.
In general, you can’t perform mathematical operations on strings, even if the strings look like numbers, so the following are illegal:

The + operator works with strings, but it might not do what you expect; it performs concatenation, which means joining the strings by linking them end-to-end. For example:

The output of this program is Lakhanlal.
The * operator also works on strings; it performs repetition. For example, ‘Save’*3 is ‘SaveSaveSave’. If one of the operands is a string, the other has to be an integer.
This use of + and * makes sense by analogy with addition and multiplication. Just as 4*3 is equivalent to 4+4+4, we expect ‘Save’*3 to be the same as ‘Save’+’Save’+’Save’. The plus ( + ) sign is the string concatenation operator, and the asterisk ( * ) is the repetition operator.
A string is a sequence of characters. You can access the characters one at a time with the bracket operator. An individual character in a string is accessed using a subscript (in­dex). The subscript should always be an integer (positive or negative). A subscript starts from 0. To access the first character of the string:

The second statement selects character number 0 from str and prints the character P. The expression in brackets is called an index. The index indicates which character in the sequence you want.
Index is an offset from the beginning of the string, and the offset of the first letter is zero. To access the second character of the string:

You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. Otherwise you get an error.

TypeError: string indices must be integers
Alternatively, you can use negative indices, which count backward from the end of the string. The expression str[-1] yields the last letter, str[-2] yields the second to last, and so on.
To access the last character of the string:

To access the third last character of the string:

Strings in Python are stored by storing each character separately in contiguous locations as shown in Figure 15.1.

strPYTHON
Positive index012345
Negative index-6-5-4-3-2-1

Figure 15.1 Structure of Python str

The index also called subscript is the numbered position of letter in the string. In Python, indices begin 0 onwards in the forward direction also called positive index, and -1, -2, -3 in the backward direction also called negative index. Positive subscript helps in accessing the string from the beginning. Negative subscript helps in accessing the string from the end.
You can access any character as <stringname>[<index>]. Index or subscript 0 or -ve n(where n is length of the string) displays the first element.
For example : str[0] or str[-6] will display first element ‘P’.
Subscript 1 or -ve (n-1) displays the second element. For example : str[l] or str[-5] will display second element ‘Y’.
The built-in function len() returns the length of a string:

Since length of string variable can be determined using len(<string>) so we can say that:

  • The first character of the string is at index 0 or -length. For example in example given in Figure 15.1, at index 0 or -6 character stored is
  • Second character of the string is at index 1 or -(length-1). For example in example given in Figure 15.1, at index 1 or -(6-1)= -5, character stored is
  • Second last character of the string is at index (length-2) or -2. For example in example given in Figure 15.1, at index (6-2)= 4 or -2, character stored is
  • The last character of the string is at index (length-1) or -1. For example in example given in Figure 15.1, at index (6-1)= 5 or -1, character stored is N

Strings are Immutable
Strings are immutable means that the contents of the string cannot be changed after it is created. Let us understand the concept of immutability with help of an example.

TypeError: object does not support item assignment The “object” in this case is the string and the “item” is the character you tried to assign. Python does not allow the programmer to change a character in a string. As shown in the above example, str has the value ‘Hello, world!’. An attempt to replace ‘H’ in the string by ‘J’ displays a TypeError.
The reason for the error is that strings are immutable, which means you can’t change an existing string. The best you can do is create a new string that is a variation on the origi­nal:

This example concatenates a new first letter onto a slice (a part of a string specified by a range of indices) of greeting. It has no effect on the original string. We will study string slice latter in this Chapter. Python strings cannot be changed. Assigning to an indexed position in the string results in an error.
You can assign to string another string or an expression that returns a string using assignment. For example,

Traversing a String
Traversing a string means accessing all the elements of the string one after the other by using the subscript or index. A string can be traversed character-by-character using: for loop or while loop. Example 1 demonstrates string traversal using for loop. In this pro­gram, each character of string “PYTHON” will be accessed one-by-one.

Example 1

Figure 15.2 Program for string traversal using for loop

Each time through the for loop, the next character in the string is assigned to the variable str. The loop continues until no character are left.

Example 2

Figure 15.3 Program for string traversal using while loop

The while loop traverses the string and displays each letter on a line by itself. The loop condition is i < len(str)/ so when i is equal to the length of the string, the condition is false, and the body of the loop is not executed.

Example 3

Figure 15.4 Program for finding length of string

In this program, we repeatedly take the user’s input and print the length of each input each time. We are providing a special condition to stop the program by checking if the user input is ‘quit’. We stop the program by breaking out of the loop and reach the end of the program. The length of the input string can be found out using the built-in len() function.

Example 4

Figure 15.5 Program to check length of a string

In this program, we accept input from the user, but we process the input string only if it is at least 3 characters long. So, we use the built-in len function to get the length and if the length is less than 3, we skip the rest of the statements in the block by using the con­tinue statement. Otherwise, the rest of the statements in the loop are executed, doing any kind of processing we want to do here.

STRING SPECIAL OPERATORS

Different string special operators are discussed in the following sections:

Concatenation (+) operator
The + operator joins the text on both sides of the operator. For example,

To give a white space between the two words, insert a space before the closing of first string.

Note that the interpreter gives back the string with single quotes whether you create strings with single quotes or double quotes.
The + operator can work with numbers and strings separately for addition and concate-nation respectively. However you cannot combine numbers and strings as operands with a + operator. It will produce an error.

Replication (*) Operator
Replication * (multiply) gives the multiplication of the two numbers or returns the string repeated that many times. For example,

The * (multiplication) operator when works with number it returns product of two numbers. The * (replication) operator used produces muliple copies of same string. For replication, if one of the operands is a string, the other has to be number, i.e., “string” * number or number * “string”. It cannot work with both operands of string types. It will display an error.

Membership Operator (in)
Membership operator in returns true if a character exists in the given string, false other-wise. For example,

Membership Operator (not in)
The not in membership returns true if a character does not exist in the given string. For example,

Comparison Operators
All the comparison operators (<,<=, > >=, <> ==) are applied to strings also. The com­parisons using these operators are character by character comparison rules for ASCII and Unicode. ASCII values of numbers 0 to 9 are from 48 to 57, uppercase (A to Z) ASCII values are 65 to 90 and lowercase (a to z) ASCII values are form 97 to 122. Python has the two boolean values True and False. Many comparison operators return True and False as discussed in Figure 15.6.

ComparisonsResultReason
“abc” == “abc”TrueBecause letter case are same.
“a” == “A”FalseBecause letter case is different.
“a” <> “A”TrueBecause letter case is different.
“abc” > “ABC”TrueBecause uppercase letters are considered smaller than the lowercase letters due to their ASCII val­ues.
‘123’ > ‘456’FalseBecause ASCII value of 123 is lower than 456.

Figure 15.6 Working of comparison operators with strings

Python provides a couple built-in functions that allow us to switch back and forth between characters and the numeric values used to represent them in strings. The ord function returns the numeric (“ordinal”) code of a single-character string, while chr goes the other direction. Here are some interactive examples:

STRING SLICES

You can retrieve a slice (substring) of a string with a slice operation. The slicing opera­tion is used by specifying the name of the sequence followed by an optional pair of numbers separated by a colon within square brackets. Note that this is very similar to the indexing operation you have been using till now. Remember the numbers are optional but the colon isn’t. Characters are accessible using their position in the string which has to be enclosed into brackets following the string. The position number can be positive or negative, which means starting at the beginning or the end of the string. Substrings are extracted by providing an interval of the start and end position separated by a colon. Both positions are optional which means either to start at the beginning of the string or to extract the substring until the end. When accessing characters, it is for­bidden to access position that does not exist, whereas during substring extraction, the longest possible string is extracted. Figure 15.7 summarizes the arrangement of slicing on a string ‘SWAROOP’.

strSWAROOP
Positive index0123456
Negative index-7-6-5-4-3-2-1

Figure 15.7 Structure of SWAROOP str

Example 5

Figure 15.8 Program for Slicing on a string

Remember the following points while accessing elements in the strings using subscripts:

  • The first number (before the colon) in the slicing operation refers to the position from where the slice starts and the second number (after the colon) indicates where the slice will stop.
  • If the first number is not specified, Python will start at the beginning of the sequence.
  • If the second number is left out, Python will stop at the end of the sequence.
  • Note that the slice returned starts at the start position and will end just before the end position, i.e., the start position is included but the end position is excluded from the sequence slice. Thus, name[l:3] returns a slice of the sequence starting at position 1, includes position 2 but stops at position 3 and therefore a slice of two items is returned.
  • Omitting both the indices, directs the python interpreter to extract the entire string starting from 0 till the last index, i.e., name[:] returns a copy of the whole sequence.
  • Positive subscript helps in accessing the string from the beginning. You can also do slicing with negative positions. Negative numbers are used for positions from the end of the sequence i.e., count backward from the end of the string. For example, name[:-l] will return a slice of the sequence which excludes the last item of the sequence but contains everything else.
  • If the first index is greater than or equal to the second, the result is an empty string, represented by two quotation marks.
  • Note that -0 is really the same as 0, so it does not count from the right.

STRING FUNCTIONS AND METHODS

Python provides built-in functions and methods that allow us to do string manipulation. Every string object that you create in Python is actually an instance of String class. The string manipulation methods that are discussed below use the following syntax:

len()
len() is a built-in function that returns the number of characters in a string.

An empty string contains no characters and has length 0, but other than that, it is the same as any other string. A string with length 1 represents a character in Python.

capitalize()
The method capitalize)) returns a copy of the string with only its first character capitalized.

find()
The find() method is used to locate the position of the given substring within the string; find returns -1 if it is unsuccessful in finding the substring.

It returns -1 because substring is not found in the given string. On omitting the start parameter the function starts the searching process from the very beginning.

isalnum()
Returns true if string contains only letters and digit, i.e., all characters are alphanumeric and false otherwise if the string contains any special character like _ , @,#,* including space.

because it has a space which is a special character.

isalpha()
Returns True if the string contains only letters, otherwise returns False.

isdigit()
Returns true if string contains only digits and false otherwise.

lower()
Converts all uppercase letters in string to lowercase.

isupper()
Return true if string is in upper case

Upper()
Return exact copy of the string with all letters in uppercase

lstrip()
Remove all the leading(left side) whitespace in string

if a string is passed as an argument to the lstrip() function, it remove those characters from the left of the string

rstrip()
Remove all trailing(right side) whitespace of string

If a string is passed as an argument to the rstripO function, it removes those characters from the right of the string.

isspace()
Returns true if string contains only whitespace characters and false otherwise.

It will return false if the string contains even one character.

istitle()
Checks whether string is title-case meaning all words are capitalized. Returns true if string is properly “titlecased” and false otherwise.

replace(old,new)
Replace all occurrences of old substring in a string with new substring.

The replace() works like a search and replace feature of a wordprocessing.

split()
Splits string according to delimiter str (space if not provided) and returns list of sub-strings, i.e., it splits a string into a list of “words”.

join()
The join() method concatenates strings from a list of strings to form a single string. Ele-ments can be joined using any separator.

swapcase()
Returns a copy of string with lowercase letters turn into uppercase and vice versa.

partition()
This function partitions the strings at the first occurrence of separator, and returns the strings partition in three parts, i.e., before the separator (head), the separator itself, and the part after the separator (tail). If the separator is not found, it returns the string itself, followed by two empty strings.

SOLVED EXERCISE

Question 1.
What are strings?

Answer:
A string is a sequence of characters. Strings are basically just a bunch of words. Strings are the form of data used in programming for storing and manipulating text, such as words, names and sentences. You can write them in Python using quotes, double quotes, or triple quotes.

Question 2.
A palindrome is a word that is spelled the same backward and forward. Write a program that checks that the word is palindrome or not.

Answer:

Figure 1 Program for Q2

Question 3.
Write a program that finds first character, last character and middle characters from entered string.

Answer:

Figure 2 Program for Q3

Question 4.
Find the output of the following program segments:

Answer:
(i) 14
(ii) 3
(iii) BANANA

Question 5.
What is the significance of triple quotes?

Answer:
You can specify multi-line strings using triple quotes – (“””). You can use single quotes and double quotes freely within the triple quotes. An example is:
‘”This is a multi-line string. This is the first line.
This is the second line.”What,s your name?,” I asked.
He said “Ram, Raja Ram.”

Question 6.
Differentiate between physical line and a logical line.

Answer:
A physical line is what you see when you write the program. A logical line is what Python sees as a single statement. Python implicitly assumes that each physical line corresponds to a logical line.

Question 7.
What is difference between lstrip() and rstripO?

Answer:
lstrip() removes all leading (left side) whitespace in string.
rstrip() removes all trailing (right side) whitespace of string.

Question 8.
Explain membership operator in with example.

Answer:
Membership operator in returns true if a character exists in the given string, false otherwise. For example,

Question 9.
What do you mean by ‘strings are immutable’?

Answer:
Strings are immutable means that the contents of the string cannot be changed after it is created.

Question 10.
What is the function of slicing operation?

Answer:
You can retrieve a slice (substring) of a string with a slice operation. The slicing operation is used by specifying the name of the sequence followed by an optional pair of numbers separated by a colon, within square brackets.

Question 11.
What is the use of Docstrings?

Answer:
Python has a feature called documentation strings, usually referred to by its shorter name docstrings. Docstrings are an important tool that you should make use of since it helps to document the program better and makes it easier to understand.

Question 12.
The quotes are not part of string. Comment.

Answer:
The quotes are not part of string. They only tell the computer where the string con-stant begins and ends. They can have any character or sign, including space in them.

Question 13.
How empty strings are represented in Python?

Answer:
An empty string contains no characters and has length 0, represented by two quota-tion marks in Python.

Question 14.
What is the difference between (+) and (*) operators?

Answer:
The plus (+) sign is the string concatenation operator, and the asterisk (*) is the repe-tition operator. You can concatenate strings with the “+” operator and create multiple concatenated copies of a string with the operator.
The use of + and * makes sense by analogy with addition and multiplication. Just as 4*3 is equivalent to 4+4+4, we expect ‘Save’*3 to be the same as ‘Save’+’Save’+’Save’.

Question 15.
Differentiate between ord() and chr() functions?

Answer:
ord() function returns the ASCII value of a character and ord() function requires single character only.
chr() function takes ASCII value in integer form and returns the character corre-sponding to the ASCII value.

Question 16.
Write short note on subscript.

Answer:
A string is a sequence of characters. You can access the characters one at a time with the bracket operator. An individual character in a string is accessed using a subscript (index). The subscript should always be an integer (positive or negative). A subscript starts from 0.

Question 17.
What do you mean by positive and negative indices?

Answer:
In Python, indices begin 0 onwards in the forward direction also called positive index, and -1, -2, -3 in the backward direction also called negative index. Positive subscript helps in accessing the string from the beginning. Negative subscript helps in accessing the string from the end.

Question 18.
‘String indices must be an integer’. Comment.

Answer:
You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. Otherwise you get an error. You can use neg-ative indices, which count backward from the end of the string.

Question 19.
What do you mean by traversing a string?

Answer:
Traversing a string means accessing all the elements of the string one after the other by using the subscript or index.

Multiple choice questions

Question 1.
Converts all uppercase letters in string to lowercase. .

  • islower()
  • swapcase()
  • lower()
  • None of the above

Answer:
lower()

Question 2.
The method __returns a copy of the string with only its first character capitalized.

  • upper()
  • capitalize()
  • swapcase()
  • None of the above

Answer:
capitalize()

Question 3.
Returns true if string contains only letters and digit.

  • isalnum()
  • isdigit()
  • isalpha()
  • None of the above

Answer:
isalnum()

Question 4.
Replication (*) means:

  • It gives the multiplication of the two numbers
  • It returns the string repeated that many times.
  • Both (a) and (b)
  • None of the above

Answer:
Both (a) and (b)

Question 5.
The find() method returns -1 if it is:

  • Unsuccessful in finding the substring.
  • Successful in finding the substring.
  • Both (a) and (b)
  • None of the above

Answer:
Unsuccessful in finding the substring.

About the author

James Palmer

Leave a Comment