COMPUTER SCIENCE
S E D G E W I C K / W A Y N E
SEDGEWICK
ntist WAYNE
Computer Science
ary
A N I N T E R D I S C I P L I N A RY A P P R O A C H
or
d
gs,
of
omputer
C cience Abstract Data Types
hem
h.
S
nce
d
wick
ysis
orics;
ver
An Interdisciplinary Approach
ROBERT SEDGEWICK
ANADA
K E V I N WAY N E
COMPUTER SCIENCE
S E D G E W I C K / W A Y N E
8. Abstract Data Types
• Overview
• String processing
Abstract data types
A data type is a set of values and a set of operations on those values.
Primitive types • integers
• values immediately map to
• floating point numbers
machine representations
• characters
• operations immediately map to
machine instructions. • and so on
We want to write programs that process other types of data.
• Colors, pictures, strings,
• Complex numbers, vectors, matrices,
• Stacks, queues, …
An abstract data type is a data type whose representation is hidden from the client.
3
Object-oriented programming (OOP)
Object-oriented programming (OOP).
• Create your own data types.
An object holds a data type value.
• Use them in your programs (manipulate objects). Variable names refer to objects.
Examples
data type set of values examples of operations
Color three 8-bit integers get red component, brighten
Picture 2D array of colors get/set color of pixel (i, j)
String sequence of characters length, substring, compare C A T A G C G C
Best practice: Use abstract data types (representation is hidden from the client).
Impact: Clients can use ADTs without knowing implementation details.
• This lecture: how to find genes.
• Next lecture: how to implement your own ADTs
4
Complex numbers
You have used ADTs!
A Complex number is of the form a + b⋅i, a and b reals
The Complex ADT allows us to write Python programs that manipulate complex numbers.
The exact representation is hidden (it could change and our programs would still work).
class Complex
x = Complex(a,b) create complex number a + b i
x.re() Real part of x
x.im() Imaginary part of x
Operations (API) x + y; i.e., __add__(self, other) Sum of x and y
x * y; i.e., __mul__(self, other) Product of x and y
abs(x); i.e., __abs(self) Magnitude of x
str(x); i.e., __str__(self) “a + b i” string representation of x
5
Using a data type: constructors and methods
To use a data type, you need to know:
• Its name (capitalized, in Python).
• How to construct new objects.
• How to apply operations to a given object.
new Building()
To construct a new object:
• Initialize a variable to “className(arguments)” x = Complex(7,2)
• Use data type name to specify type of object.
y = Complex(4,4)
To apply an operation (invoke a method): w = x.re()
• Use object name to specify which object. z = x.__add__(y)
• Use the dot operator to indicate that an operation
z = x + y #alternative
is to be applied.
• Use a method name to specify which operation. Image: http://upload.wikimedia.org/wikipedia/commons/6/6a/
Construction_Site_for_The_Oaks_High_School_Retford_-_geograph.org.uk_-_89555.jpg 6
Pop quiz on ADTs
Q. What is a data type?
A. A set of values and a set of operations on those values.
Q. What is an abstract data type?
A. A data type whose representation is hidden from the client.
7
COMPUTER SCIENCE
S E D G E W I C K / W A Y N E
Abstract Data Types
• Overview
• String processing
String ADT
A String is a sequence of Unicode characters.
class str
Python’s ADT allows us to s = “…” create a string with value “…”
write Python programs len() string length
that manipulate strings. s[i] character at position i
s[i:j] Characters form i to (j-1)
sub in s does string s contain sub?
s.startswith(pre) does string s start with pre?
Operations (API)
s.endswith(post) does string s end with post?
s.find(p) index of first occurrence of p in s
s.find(p, i) index of first occurrence of p from i
s+t string s with t appended
s.replace(a, b, times) Returns a str with a changed to b # times
s.split(delim) strings between occurrences of str delim
x == t string x and t reference the same value?
9
Programming with strings: typical examples
Is the string a palindrome?
def palindrome(text):
if len(text) <= 1: return True
return (text[0] == text[-1]) and palindrome(text[1:-1])
recursion!
python3 palindrome.py
noon: True
wasitaratisaw: True
neveroddoreven: True
nope: False
10
Programming with strings: typical examples
Find lines containing a specified string
def find_query(lines, query):
for line in lines:
if line.find(query) != -1: print(line)
python3 findquery.py
Original lines of text:
There once was a puppy called Dougal,
who was sometimes good,
but not very often.
Because, like all puppies,
he liked to chew, bite, bark and sleep.
And once he'd done enough of those things to satisfy his needs,
he found that there wasn't much of the day left for anything else.
Search results: Found 'was' on these lines:
There once was a puppy called Dougal,
who was sometimes good,
he found that there wasn't much of the day left for anything else.
11
String client example: gene finding
Pre-genomics era: Sequence a human genome.
Post-genomics era: Analyze the data and understand structure.
Genomics: Represent genome as a string over A C T G alphabet.
Gene: A non-empty substring of genome that represents a functional unit.
• Made of codons (three A C T G nucleotides).
• Preceded by ATG (start codon).
• Succeeded by TAG, TAA, or TGA (stop codon).
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
A T A G A T G C A T A G C G C A T A G C T A G A T G T G C T A G C
start gene stop start gene stop
Goal. Write a Python program to find genes in a given genome.
12
String client warmup: Identifying a potential gene
Goal. Write a Python function to determine 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
A T G C A T A C T G C A T A G
whether a given string is a potential gene. start gene stop
def gene(dna):
# input string length must be > 6 and len(dna) multiple of 3
if len(dna) % 3 != 0 or len(dna) <= 6: return False
if not dna.startswith("ATG"): return False
codon = [dna[i:i+3] for i in range(0,len(dna)-3,3)]
if "TAA" in codon or "TAG" in codon or "TGA" in codon: return False
if dna.endswith("TAA") or dna.endswith("TAG") or dna.endswith(“TGA"): return True
return False
>>> gene(“ATGCATACTGCATAG”)
true
>>> gene(“ATGCGCTGCGTCTGTACTAG”)
false
>>> gene(“ATGCCGTGACGTCTGTACTAG”)
false 13
String client exercise: Gene finding
Goal. Write a Python program to find genes in a given genome.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
A T A G A T G C A T A G C G C A T A G C T A G A T G T G C T A G C
start gene stop start gene stop
Algorithm. Scan left-to-right through dna.
• If start codon ATG found, set beg to index i.
• If stop codon found and substring length is a multiple of 3, print gene and reset beg to -1.
codon
i
start stop
beg output remainder of input string
0 -1 ATA G AT G C ATA G C G C ATA G C TA G AT G T G C TA G C
1 TA G -1 TA G AT G C ATA G C G C ATA G C TA G AT G T G C TA G C
4 AT G 4 AT G C ATA G C G C ATA G C TA G AT G T G C TA G C
9 TA G 4 TA G C G C ATA G C TA G AT G T G C TA G C
16 TA G 4 C ATA G C G C A TA G C TA G AT G T G C TA G C
20 TA G -1 TA G AT G T G C TA G C
23 AT G 23 AT G T G C TA G C
29 TA G 23 TGC TA G C
Implementation. Entertaining programming exercise!
14
OOP context for strings
Possible memory representation: genome = "aacaagtttacaagc"
s = genome[1:5]
t = genome[9:13]
x genome t s
a a c a a g t t t a c a a g c x 15 x+9 4 x+1 4
memory length
address
>>> genome = "aacaagtttacaagc"
>>> s = genome[1:5]
>>> s
'acaa'
>>> t = genome[9:13] Implications
>>> t • s and t are different strings that share the same value "acaa".
'acaa'
>>> s is t • s is t returns False, because it compares addresses.
False • s == t returns True, because it compares character sequences.
>>> s == t 15
• Python’s String interface is more complicated than the API.
True
Object-oriented programming: summary
Object-oriented programming.
• Create your own data types (sets of values and ops on them).
An object holds a data type value.
• Use them in your programs (manipulate objects). Variable names refer to objects.
In Python, programs manipulate references to objects.
• String, Picture, Color, arrays/lists, (and everything else) are reference types.
• OOP purist: Languages should not have separate primitive types.
• Practical programmer: Primitive types provide needed efficiency.
T A G A T G T G C T A G C 7 + 2i
16
Reading and References
These Slides
• principally based on slides by Robert Sedgewick and Kevin Wayne
Recommended Reading
• online booksite (Python version) chapter 3
Wider reading:
• read more here about magic methods and operator overloading
Activity:
• make sure that you’re comfortable using data types and their methods
• so practice using objects such as strings
• maybe go look up Python’s built in complex number class
• or even play around with some graphics objects, as seen in the code for Flood Fill from the
previous topic
17