Strings
Chapter 6
Python for Informatics: Exploring Information
www.py4inf.com
Unless otherwise noted, the content of this course material is licensed under a Creative
Commons Attribution 3.0 License.
http://creativecommons.org/licenses/by/3.0/.
Copyright 2010, 2011 Charles Severance
String Data Type >>> str1 = "Hello"
>>> str2 = 'there'
• A string is a sequence of >>> bob = str1 + str2
>>> print bob
characters
Hellothere
• A string literal uses quotes ‘Hello’ >>> str3 = '123'
>>> str3 = str3 + 1
or “Hello”
Traceback (most recent call last):
• For strings, + means “concatenate” File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str'
• When a string contains numbers, it and 'int' objects
is still a string >>> x = int(str3) + 1
>>> print x
• We can convert numbers in a 124
string into a number using int() >>>
Reading and >>> name = raw_input('Enter:')
Converting Enter:Chuck
>>> print name
Chuck
• We prefer to read data in
>>> apple = raw_input('Enter:')
using strings and then parse
and convert the data as we Enter:100
need >>> x = apple - 10
Traceback (most recent call last):
• This gives us more control File "<stdin>", line 1, in <module>
TypeError: unsupported operand
over error situations and/
or bad user input type(s) for -: 'str' and 'int'
>>> x = int(apple) - 10
• Raw input numbers must >>> print x
90
be converted from strings
Looking Inside Strings
• We can get at any single character in b a n a n a
a string using an index specified in 0 1 2 3 4 5
square brackets
>>> fruit = 'banana'
• The index value must be an integer >>> letter = fruit[1]
>>> print letter
and starts at zero
a
• The index value can be an >>> n = 3
expression that is computed >>> w = fruit[n - 1]
>>> print w
n
A Character Too Far
>>> zot = 'abc'
• You will get a python error if you >>> print zot[5]
attempt to index beyond the end Traceback (most recent call last):
of a string. File "<stdin>", line 1, in <module>
IndexError: string index out of
• So be careful when constructing range
index values and slices >>>
Strings Have Length
b a n a n a
0 1 2 3 4 5
• There is a built-in function len that
gives us the length of a string >>> fruit = 'banana'
>>> print len(fruit)
6
Len Function
>>> fruit = 'banana' A function is some stored
>>> x = len(fruit) code that we use. A
>>> print x function takes some input
6 and produces an output.
'banana' len() 6
(a string) function (a number)
Guido wrote this code
Len Function
>>> fruit = 'banana' A function is some stored
>>> x = len(fruit) code that we use. A
>>> print x function takes some input
6 and produces an output.
def len(inp):
blah
'banana' blah 6
(a string) for x in y: (a number)
blah
blah
Looping Through Strings
0b
• Using a while statement and index = 0
while index < len(fruit) :
1a
an iteration variable, and the 2n
len function, we can construct letter = fruit[index]
3a
a loop to look at each of the print index, letter
4n
letters in a string individually index = index + 1
5a
Looping Through Strings
• A definite loop using a for b
statement is much more a
elegant for letter in fruit : n
print letter a
• The iteration variable is n
completely taken care of by a
the for loop
Looping Through Strings
fruit = 'banana'
• A definite loop using a for for letter in fruit : b
statement is much more print letter a
elegant n
a
• The iteration variable is index = 0
while index < len(fruit) :
n
completely taken care of by a
the for loop letter = fruit[index]
print letter
index = index + 1
Looping and Counting
• This is a simple loop that word = 'banana'
count = 0
loops through each letter in a
string and counts the number for letter in word :
of times the loop encounters if letter == 'a' :
the 'a' character. count = count + 1
print count
Looking deeper into in
• The iteration variable
“iterates” though the
sequence (ordered set) Six-character string
Iteration variable
• The block (body) of code is
executed once for each for letter in 'banana' :
value in the sequence
print letter
• The iteration variable
moves through all of the
values in the sequence
Yes b a n a n a
Done? Advance letter
print letter
letter
for letter in 'banana' :
print letter
The iteration variable “iterates” though the string and the block
(body) of code is executed once for each value in the sequence
M o n t y P y t h o n
0 1 2 3 4 5 6 7 8 9 10 11
• We can also look at any >>> s = 'Monty Python'
continuous section of a string >>> print s[0:4]
using a colon operator Mont
>>> print s[6:7]
• The second number is one
P
beyond the end of the slice -
>>> print s[6:20]
“up to but not including”
Python
• If the second number is
beyond the end of the string,
it stops at the end Slicing Strings
M o n t y P y t h o n
0 1 2 3 4 5 6 7 8 9 10 11
>>> s = 'Monty Python'
>>> print s[:2]
Mo
• If we leave off the first
>>> print s[8:]
number or the last number of
thon
the slice, it is assumed to be
>>> print s[:]
the beginning or end of the
Monty Python
string respectively
Slicing Strings
String Concatenation
>>> a = 'Hello'
>>> b = a + 'There'
>>> print b
• When the + operator is HelloThere
>>> c = a + ' ' + 'There'
applied to strings, it
means "concatenation" >>> print c
Hello There
>>>
Using in as an Operator
>>> fruit = 'banana'
>>> 'n' in fruit
• The in keyword can also be True
used to check to see if one >>> 'm' in fruit
string is "in" another string False
>>> 'nan' in fruit
• The in expression is a logical True
expression and returns True >>> if 'a' in fruit :
or False and can be used in ... print 'Found it!'
an if statement ...
Found it!
>>>
String Comparison
if word == 'banana':
print 'All right, bananas.'
if word < 'banana':
print 'Your word,' + word + ', comes before banana.'
elif word > 'banana':
print 'Your word,' + word + ', comes after banana.'
else:
print 'All right, bananas.'
String Library
• Python has a number of string
functions which are in the string
library >>> greet = 'Hello Bob'
>>> zap = greet.lower()
• These functions which are already >>> print zap
hello bob
built into every string - we call them
by appending the function to the >>> print greet
string variable Hello Bob
>>> print 'Hi There'.lower()
• These functions do not modify the hi there
>>>
original string, instead they return a
new string that has been altered
>>> stuff = 'Hello world'
>>> type(stuff)
<type 'str'>
>>> dir(stuff)
['capitalize', 'center', 'count', 'decode', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'index',
'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip',
'partition', 'replace', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines',
'startswith', 'strip', 'swapcase', 'title', 'translate',
'upper', 'zfill']
http://docs.python.org/lib/string-methods.html
http://docs.python.org/lib/string-methods.html
String Library
str.capitalize() str.replace(old, new[, count])
str.center(width[, fillchar]) str.lower()
str.endswith(suffix[, start[, end]]) str.rstrip([chars])
str.find(sub[, start[, end]]) str.strip([chars])
str.lstrip([chars]) str.upper()
http://docs.python.org/lib/string-methods.html
Searching a String
• We use the find() function b a n a n a
to search for a substring 0 1 2 3 4 5
within another string
• find() finds the first >>> fruit = 'banana'
>>> pos = fruit.find('na')
occurance of the substring
>>> print pos
• If the substring is not found, 2
>>> aa = fruit.find('z')
find() returns -1
>>> print aa
• Remember that string -1
position starts at zero
Making everything UPPER CASE
>>> greet = 'Hello Bob'
• You can make a copy of a string in >>> nnn = greet.upper()
lower case or upper case >>> print nnn
HELLO BOB
• Often when we are searching for a
>>> www = greet.lower()
string using find() - we first convert
the string to lower case so we can >>> print www
hello bob
search a string regardless of case
>>>
Search and Replace
• The replace() function >>> greet = 'Hello Bob'
>>> nstr = greet.replace('Bob','Jane')
is like a “search and
replace” operation in >>> print nstr
a word processor Hello Jane
>>> greet = 'Hello Bob'
• It replaces all >>> nstr = greet.replace('o','X')
occurrences of the >>> print nstr
search string with the HellX BXb
replacement string >>>
Stripping Whitespace
• Sometimes we want to take a >>> greet = ' Hello Bob '
string and remove whitespace >>> greet.lstrip()
at the beginning and/or end 'Hello Bob '
>>> greet.rstrip()
• lstrip() and rstrip() to the left
' Hello Bob'
and right only
>>> greet.strip()
• strip() Removes both begin 'Hello Bob'
>>>
and ending whitespace
Prefixes
>>> line = 'Please have a nice day'
>>> line.startswith('Please')
True
>>> line.startswith('p')
False
21 31
>>> data = 'From [email protected] Sat Jan 5 09:14:16 2008'
>>> atpos = data.find('@')
>>> print atpos
21
>>> sppos = data.find(' ',atpos)
>>> print sppos
31
>>> host = data[atpos+1 : sppos]
>>> print host
uct.ac.za
Summary
• String type • in as an operator
• Read/Convert • String comparison
• Indexing strings [] • String library
• Slicing strings [2:4] • Searching in strings
• Looping through strings with for • Replacing text
and while
• Stripping white space
• Concatenating strings with +
• Pulling strings apart wth slice