Strings
Chapter 6
Python for Everybody
www.py4e.com
String Data Type >>> str1 = "Hello"
>>> str2 = 'there'
>>> bob = str1 + str2
• A string is a sequence of characters >>> print(bob)
Hellothere
• A string literal uses quotes >>> str3 = '123'
>>> str3 = str3 + 1
'Hello' or "Hello"
Traceback (most recent call last): File
"<stdin>", line 1, in <module>
• For strings, + means “concatenate”
TypeError: cannot concatenate 'str' and 'int'
objects
• When a string contains numbers, it is >>> x = int(str3) + 1
still a string >>> print(x)
124
• We can convert numbers in a string >>>
into a number using int()
Reading and >>> name = input('Enter:')
Converting Enter:Chuck
>>> print(name)
Chuck
• We prefer to read data in using >>> apple = input('Enter:')
strings and then parse and Enter:100
convert the data as we need >>> x = apple – 10
Traceback (most recent call last): File
• This gives us more control over "<stdin>", line 1, in <module>
error situations and/or bad user TypeError: unsupported operand type(s) for -:
input 'str' and 'int'
>>> x = int(apple) – 10
• Input numbers must be >>> print(x)
converted from strings 90
Looking Inside Strings
• We can get at any single character in a b a n a n a
string using an index specified in 0 1 2 3 4 5
square brackets
>>> fruit = 'banana'
• The index value must be an integer >>> letter = fruit[1]
>>> print(letter)
and starts at zero a
>>> x = 3
• The index value can be an expression >>> w = fruit[x - 1]
that is computed >>> print(w)
n
A Character Too Far
• You will get a python error
if you attempt to index
beyond the end of a string
>>> zot = 'abc'
• So be careful when
>>> print(zot[5])
Traceback (most recent call last): File
constructing index values "<stdin>", line 1, in <module>
and slices IndexError: string index out of range
>>>
• Not to confuse this with an
empty range in case of
slicing
Strings Have Length
b a n a n a
The built-in function len gives 0 1 2 3 4 5
us the length of a string
>>> fruit = 'banana'
>>> print(len(fruit))
6
len Function
>>> fruit = 'banana' A function is some stored
>>> x = len(fruit) code that we use. A
>>> print(x) function takes some
6 input and produces an
output.
'banana' len() 6
(a number)
(a string) function
len Function
>>> fruit = 'banana' A function is some stored
>>> x = len(fruit) code that we use. A
>>> print(x) function takes some
6 input and produces an
output.
def len(inp):
blah
'banana' blah 6
for x in y: (a number)
(a string) blah
blah
Looping Through Strings
Using a while statement, fruit = 'banana'
0b
an iteration variable, and index = 0 1a
the len function, we can while index < len(fruit): 2n
construct a loop to look at letter = fruit[index] 3a
print(index, letter) 4n
each of the letters in a index = index + 1
string individually 5a
Looping Through Strings
• A definite loop using a b
for statement is much a
more elegant fruit = 'banana'
for letter in fruit: n
a
• The iteration variable is print(letter)
n
completely taken care of a
by the for loop
Looping Through Strings
• A definite loop using a fruit = 'banana'
for letter in fruit :
b
for statement is much a
print(letter)
more elegant n
a
• The iteration variable is index = 0 n
completely taken care of while index < len(fruit) :
a
by the for loop letter = fruit[index]
print(letter)
index = index + 1
Looping and Counting
word = 'banana'
This is a simple loop that count = 0
loops through each letter in a for letter in word :
string and counts the number if letter == 'a' :
of times the loop encounters count = count + 1
the 'a' character print(count)
Looking Deeper into in
• The iteration variable “iterates”
through the sequence Iteration Six-character
(ordered set) variable string
• The block (body) of code is
executed once for each value for letter in 'banana' :
in the sequence
print(letter)
• The iteration variable moves
through all of the values in the
sequence
Yes No b a n a n a
Done? Advance letter
print(letter)
for letter in 'banana' :
print(letter)
The iteration variable “iterates” through the string and the block (body)
of code is executed once for each value in the sequence
More String Operations
Slicing Strings M o n t y P y t h o n
0 1 2 3 4 5 6 7 8 9 10 11
• We can also look at any
continuous section of a string
using a colon operator >>> s = 'Monty Python'
>>> print(s[0:4])
• The second number is one Mont
beyond the end of the slice - >>> print(s[6:7])
“up to but not including” P
• If the second number is >>> print(s[6:20])
beyond the end of the string, Python
it stops at the end
Slicing Strings M o n t y P y t h o n
0 1 2 3 4 5 6 7 8 9 10 11
>>> s = 'Monty Python'
If we leave off the first number >>> print(s[:2])
or the last number of the slice, Mo
it is assumed to be the >>> print(s[8:])
beginning or end of the string thon
respectively
>>> print(s[:])
Monty Python
String Concatenation
>>> a = 'Hello'
>>> b = a + 'There'
When the + operator is >>> print(b)
applied to strings, it means HelloThere
“concatenation” >>> c = a + ' ' + 'There'
>>> print(c)
Hello There
>>>
Using in as a Logical Operator
>>> fruit = 'banana'
• The in keyword can also be >>> 'n' in fruit
used to check to see if one True
string is “in” another string >>> 'm' in fruit
False
• The in expression is a >>> 'nan' in fruit
True
logical expression that >>> if 'a' in fruit :
returns True or False and ... print('Found it!')
can be used in an if ...
statement Found it!
>>>
String Comparison
if word == 'banana':
print('All right, bananas.')
if word < 'banana':
print('Your word,' + word + ', comes before banana.')
elif word > 'banana':
print('Your word,' + word + ', comes after banana.')
else:
print('All right, bananas.')
• Python has a number of string String Library
functions which are in the
string library
>>> greet = 'Hello Bob'
• These functions are already >>> zap = greet.lower()
built into every string - we >>> print(zap)
invoke them by appending the hello bob
function to the string variable >>> print(greet)
Hello Bob
• These functions do not modify >>> print('Hi There'.lower())
the original string, instead they hi there
return a new string that has >>>
been altered
>>> stuff = 'Hello world'
>>> type(stuff)
<class 'str'>
>>> dir(stuff)
['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format',
'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric',
'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']
https://docs.python.org/3/library/stdtypes.html#string-methods
String Library
str.capitalize() str.replace(old, new[, count])
str.center(width[, fillchar]) str.lower()
str.endswith(suffix[, start[, end]]) str.rstrip([chars])
str.find(sub[, start[, end]]) str.strip([chars])
str.lstrip([chars]) str.upper()
Searching a String
b a n a n a
• We use the find() function to search
for a substring within another string
0 1 2 3 4 5
• find() finds the first occurrence of the >>> fruit = 'banana'
substring >>> pos = fruit.find('na')
>>> print(pos)
• If the substring is not found, find() 2
returns -1 >>> aa = fruit.find('z')
>>> print(aa)
• Remember that string position starts -1
at zero
Making everything UPPER CASE
• You can make a copy of a >>> greet = 'Hello Bob'
string in lower case or upper >>> nnn = greet.upper()
case >>> print(nnn)
HELLO BOB
• Often when we are searching
>>> www = greet.lower()
for a string using find() we first
convert the string to lower case >>> print(www)
hello bob
so we can search a string
>>>
regardless of case
Search and Replace
• The replace() function
is like a “search and >>> greet = 'Hello Bob'
replace” operation in a >>> nstr = greet.replace('Bob','Jane')
>>> print(nstr)
word processor
Hello Jane
>>> nstr = greet.replace('o','X')
• It replaces all >>> print(nstr)
occurrences of the HellX BXb
search string with the >>>
replacement string
Stripping Whitespace
• Sometimes we want to take
a string and remove
whitespace at the beginning >>> greet = ' Hello Bob '
and/or end >>> greet.lstrip()
'Hello Bob '
• lstrip() and rstrip() remove
>>> greet.rstrip()
' Hello Bob'
whitespace at the left or right >>> greet.strip()
'Hello Bob'
• strip() removes both >>>
beginning and ending
whitespace
Prefixes
>>> line = 'Please have a nice day'
>>> line.startswith('Please')
True
>>> line.startswith('p')
False
Parsing and
21 31 Extracting
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos)
>>> print(sppos)
31
>>> host = data[atpos+1 : sppos]
>>> print(host)
uct.ac.za
Two Kinds of Strings
Python 2.7.10 Python 3.5.1
>>> x = ' 이광춘 ' >>> x = ' 이광춘 '
>>> type(x) >>> type(x)
<type 'str'> <class 'str'>
>>> x = u' 이광춘 ' >>> x = u' 이광춘 '
>>> type(x) >>> type(x)
<type 'unicode'> <class 'str'>
>>> >>>
In Python 3, all strings are Unicode
Summary
• String type • String operations
• Read/Convert • String library
• Indexing strings [] • String comparisons
• Slicing strings [2:4] • Searching in strings
• Looping through strings • Replacing text
with for and while • Stripping white space
• Concatenating strings with +
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance ( ...
www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.
Initial Development: Charles Severance, University of Michigan
School of Information
… Insert new Contributors and Translators here