Data structures
Chris Rowlands
Stuff-I-got-interested-in-this-week group
Data Structures
Learning outcomes
• Understand what a data structure is
• Be able to select a suitable data structure based on some simple
requirements
• Be familiar with arrays, linked lists and hash tables
• Be able to use all of Python’s built-in data structures (list, tuple, dict, set,
frozenset)
Data Structures
Outline
• Introduction to data structures
– Arrays
– Linked lists
• Python data structures
– Lists
– Tuples
– Dicts (hash tables)
– Sets
Data Structures
What is a data structure?
• How do we organize large amounts of data in memory, to achieve our design
goals?
– Easy to resize if needed (both elements and the whole structure)
– Fast to read, write (elements, subsets / slices, and the whole structure)
– Easy to perform operations on the structure (max, mean, sum etc.)
– Works with different element data types, including indeterminate sizes
– Memory-efficient: doesn’t take up much space, not much overhead.
• Which goals are important depends on what problem you’re solving
Data Structures
Examples of data structures
• Books on a bookshelf
– 1D array
• Family tree
– Tree
• Treasure hunt
– Linked list
Data Structures
Arrays
• How would you store a monochrome 2D image
in memory efficiently?
– Each pixel has the same bit depth
– Store each pixel in row 1, then row 2, etc.
• Just need to know where the first element is,
plus the height, width and bit depth!
– Plus data type in a general case…
• Can work for 3D, 4D etc. data too.
Data Structures
Arrays
• Pros:
– Memory efficient (little overhead, ~100% use of space)
– Fast to read / write (easy to calculate where each pixels is in memory)
• Cons:
– Can’t resize the array quickly (or add a new element in the middle easily)
– Can’t have elements of different data types, or lengths
– Generally requires large, contiguous blocks of memory
• Python doesn’t natively support arrays, strangely.
Data Structures
Linked lists
• Each node has the address of the next
• Nodes can be anywhere in memory
– No need for large contiguous blocks
– Nodes can be different sizes if needed
– Resizing the structure is easy
• BUT:
– Traversing the structure is slow
– There is substantial memory overhead
Data Structures
Python data structures - lists
Data [1]
• Actually a dynamic array of indices, pointing Address
to arbitrary objects 1
• Define a list using square brackets Data [2]
503
– Mixing list element types is OK
• 0118999 Data [3]
Slice the list using square brackets too
– myList[1] returns the second element 3
in the list because we index from zero Data [4]
Data Structures
Python data structures - lists
• You can slice multiple elements at once using a colon
– myList[start:end] Note that the slice doesn’t include end
• You don’t have to specify the start or end – they will default appropriately
– myList[:] is commonly used to make a copy of the list
• It is possible to take slices with different steps too
– myList[start:end:step]
• Mix and match start, end and step defaults as desired
Data Structures
Python data structures - lists
• Negative indices count from the end of the list!
– -1 is the final element of the list, -2 is the penultimate element etc.
• Works for step as well – counts backwards from the end
– Don’t forget to reverse the start and end values or you’ll get an empty list
• Nested lists can be accessed using more sets of square brackets
Data Structures
Python data structures - tuples
• Essentially immutable lists
– Fast lookup
– Memory-efficient
– Changing requires redefining the whole tuple
• Created similarly to lists, but with () as opposed to []
• You can still index and slice using myTuple[]
Data Structures
Quiz
Starting with this list:
myList = [0,1,2,3,4,5,6,7,8,9]
Which slice parameters will produce the following output?
[2,3,4,5]
[7,5,3,1]
[9,6,3] (no positive numbers allowed)
Data Structures
Hash tables
• How quickly can you look up
someone’s number in a phone book?
– log(N) operations
• Determine the address of the data
using a hash function
– Only one operation needed!
Data Structures
What is a hash function?
• Function to ‘map’ an arbitrary-length key to a fixed-length digest
– e.g. MD5, SHA-1
• Output is uniform – all outputs are equally likely
– Minimal likelihood of finding two keys with the same digest: a collision.
• Example: folded hash 0
1 1
0 0
1 0
1 0
1 1
0 0
1 1
0 BA
26
55
– hash(8F8F 26 55 BA
BA) = 46
0
1 1
0 0
1 0
1 0
1 1
0 1
0 0
1 46
FC
A9
8F
Data Structures
Python data structures - dicts
• Built-in implementation of a hash table
• Enclose in curly brackets
• Separate keys and values with colons, and separate items with commas
• No duplicate keys allowed!
phonenumbers = {
"Emergency services": '0118 999 88199 9119725
3',
"Annoying directory enquiries": '118 118',
"Jenny": '867-5309',
"Dr Turk": '916-225-5887'
}
print(phonenumbers["Jenny"])
867-5309
Data Structures
Python data structures – sets and frozensets
• Very similar to formal mathematical sets
• Elements can be any hashable datatype
– No duplicates
• Unordered (you can’t get the nth element)
• Created with curly brackets: fruit = {‘apple’, ‘banana’, ‘pear’}
• Frozensets can be made using sets: smoothie = frozenset(fruit)
Data Structures
Set operators
• Union (everything in set A or set B): A|B
– No duplicate elements
• Intersection (only elements which are in both sets): A&B
• Difference (elements in A that aren’t in B): A-B
• Symmetric difference (elements in A and B but not in both): A^B
Data Structures
Summary
• Learned what a data structure is, and some classic examples
– Array, linked list, hash table
• Learned how to use Python’s built-in data structures
– Lists and tuples, including slicing
– Dicts
– Sets and frozensets, including set operators