Lecture 4: The Array Data Structure
CIS-275
Data Structures
● A data structure is a collection that a computer program uses to hold groupings of
data.
○ For example: lists, dictionaries, and sets are all data structures.
● In general, each type of data structure:
○ Stores its data in a different way.
○ Allows the client to retrieve its data in a different way.
○ Allows the client to update its data in a different way.
Chapter 4 Arrays and Linked Structures 2
Data Structures
● Each data structure we create in this course will have its own class.
○ i.e. Our Stack data structure will have a class named Stack.
● Complex data structures (such as stacks, queues, trees, and graphs) often actually
store their data in a simpler data structure.
○ This is referred to as a data structure’s internal structure.
○ We will see how this works in later lectures.
● There are two types of simple data structures that usually internally hold the data of
more complex data structures: Arrays and Linked Lists.
○ For example, suppose we create a stack class:
○ One of the class attributes might be self._array, an array object which holds
the stack’s data.
Chapter 4 Arrays and Linked Structures 3
The Array Data Structure
● An array is like a Python list but with less functionality.
○ Several of the data structures we implement in the future will use the array class
to internally store their data.
● Note: We could use a list to implement these data structures, but we are creating an
array class to use for the following reasons:
○ Many other programming languages only have arrays.
■ It is good practice to learn how to work with them.
○ The restrictions placed on an array will actually make the implementation of
complex data structures more simple. Important for Data Analysis
○ When working with large datasets for data analysis and machine learning, the
NumPy library allows for the creation of arrays that are much more efficient than
Python’s lists.
■ Our Array class will have similarities to this.
Chapter 4 Arrays and Linked Structures 4
The Array Data Structure
Some basic info about the Array data structure:
● When creating an Array object, the client passes its size to the constructor.
○ An Array cannot be resized after it is created.
● The client can perform the following operations on an Array object:
○ Retrieve an item at a given index.
○ Replace an item at a given index.
○ Determine the array’s length.
○ Retrieve a string representation of the array.
○ Iterate over all of an array’s indexes with a for loop.
Chapter 4 Arrays and Linked Structures 5
The Array Data Structure
● Suppose we create an array with a size of 5:
a = Array(5)
0 1 2 3 4
a
None None None None None
● There is no append method.
● The only way to add an item is to use the subscript operator with a valid index.
● For example, to add the value 5 to the first index in a:
a[0] = 5
Chapter 4 Arrays and Linked Structures 6
Overloaded Methods
● We will overload the following six methods in the Array class.
● The method in the right column is called by Python when its corresponding action on
the left occurs. For each, assume a references an Array object:
Client’s Array Operation Method in the Array Class
a = Array(10) __init__(capacity)
creates a len() function for the class
len(a) __len__()
str(a) __str__()
for item in a: __iter__() creates a for loop to iterate each index in the array
a[index] __getitem__(index) get an index in a
update an index value
a[index] = newitem __setitem__(index, new_item)
Chapter 4 Arrays and Linked Structures 7
The Array Class
● When designing a class, it’s best to build the easy methods first (after the constructor,
of course).
● We’ll design our Array class in the following order:
○ Constructor.
○ __len__ and __str__
○ __iter__ (because the remaining methods rely on this one) Allows us to iterate one index
at a time
○ __getitem__ and __setitem__
Chapter 4 Arrays and Linked Structures 8
The Array Class __init__ method
class Array:
def __init__ (self, capacity, initial_value= None):
""" Capacity is the static size of the array. Each index is initialized to None """
self._items = []
for count in range(capacity):
self._items.append(initial_value)
● In the constructor, we create a list attribute to store the array’s data internally.
○ Note: As with other data structures, we are going to limit how the client can
interact with the data held in this list.
○ This is our first example of using one data structure (a list) to hold the internal
data of a different type of data structure (an array).
Chapter 4 Arrays and Linked Structures 9
The Array Class __init__ method
class Array:
def __init__ (self, capacity, initial_value= None):
""" Capacity is the static size of the array. Each index is initialized to None """
self._items = []
for count in range(capacity):
self._items.append(initial_value)
● Notice that the constructor has two parameters.
● If the client passes a second argument, the array is filled with that value.
● Otherwise, the array is filled with the default None:
a = Array(5, 3) # Creates an array with 5 3s in it
a2 = Array(5) # Creates an array with 5 Nones in it
Chapter 4 Arrays and Linked Structures 10
The Array Class __len__ method
def __len__(self):
""" Returns the capacity of the array. """
???
● How should we write the __len__ method?
Chapter 4 Arrays and Linked Structures 11
The Array Class __len__ method
def __len__(self):
""" Returns the capacity of the array. """
return len(self._items)
● How should we write the __len__ method?
● Because Python’s list already overloads __len__, we can simply call len on the
internal list.
● The result returned to us is then returned to the client.
Chapter 4 Arrays and Linked Structures 12
The Array Class __str__ method
def __str__(self):
""" Returns string representation of the array """
???
● How should we write the __str__ method?
Chapter 4 Arrays and Linked Structures 13
The Array Class __str__ method
def __str__(self):
""" Returns string representation of the array """
return str(self._items)
● How should we write the __str__ method?
● Again, because Python’s list already overloads __str__, we can simply call str on
the internal list.
● The result returned to us is then returned to the client.
Chapter 4 Arrays and Linked Structures 14
The __iter__ Method
● When Python sees the for and in keywords, it calls the __iter__ method on in’s
right operand.
● For example:
my_array = Array( 3, 5)
for num in my_array:
● The second statement calls the __iter__ method on my_array.
● This method works a little differently from the others we have seen.
○ It is automatically called multiple times (one for each item in the array).
○ Each time it is called, the next item in the array is returned and assigned to num.
Chapter 4 Arrays and Linked Structures 15
The __iter__ Method
● For an Array, we can simply call the iter function and pass the internal list to it.
This will return an iterator over self._items.
def __iter__ (self):
""" Supports traversal with a for loop """
return iter(self._items)
● Don’t worry if this method seems confusing! We will discuss how iterators work in
more detail soon.
● For now, all you need to know is that this method allows the client to use a for loop
while working with an Array.
Chapter 4 Arrays and Linked Structures 16
The Array Class
● What will this code print?
from arrays import Array
def main():
a = Array( 5)
for item in a:
print(item, end=" ")
main()
Chapter 4 Arrays and Linked Structures 17
The Array Class
● What will this code print?
from arrays import Array
def main():
a = Array( 5)
for item in a:
print(item, end=" ")
main()
None None None None None
● The array we created has five indexes, but by default each has “None” in it.
Chapter 4 Arrays and Linked Structures 18
The Array Class
a = Array( 5)
● Unlike a list, our Array class requires a size to be passed to the constructor.
○ Recall that the value None is placed in each index of the array.
○ In this case the Array has five indexes but no valid data in it (i.e. it is considered
empty).
● When this statement executes, the array a references appears like this in memory:
0 1 2 3 4
None None None None None
Chapter 4 Arrays and Linked Structures 19
The Array Class
● When working with an array, there are two size-related concepts to be aware of
○ Physical size: The total number of indexes (or memory locations) the array
contains in memory.
■ In this example, the array’s physical size is 5.
○ Logical size: The number of valid items currently in the array.
■ A default value placed in an array upon creation is considered invalid.
■ In this example, the array’s logical size is 0.
a = Array( 5)
0 1 2 3 4
None None None None None
Chapter 4 Arrays and Linked Structures 20
The Array Class
● Consider the following code:
a = Array( 5)
a[1] = 7
● What values does the array contain?
● What is its logical size?
Chapter 4 Arrays and Linked Structures 21
The Array Class
● Consider the following code:
a = Array( 5)
a[1] = 7 0 1 2 3 4
None 7 None None None
● What values does the array contain?
● What is its logical size? Invalid!
● In an array, we never want valid data to be placed in indexes after empty ones (we’ll
see why in a minute).
● To avoid this, we need to track how many valid items are currently in the array at all
times.
Chapter 4 Arrays and Linked Structures 22
The Array Class
● A variable (which we’ll call numitems in this example) can be used to track how
many valid data items are in an array.
● Suppose we want to add the value ‘7’ to my_array:
my_array = Array( 5)
numitems = 0
● What steps should we take?
Chapter 4 Arrays and Linked Structures 23
The Array Class
● A variable (which we’ll call numitems in this example) can be used to track how
many valid data items are in an array.
● Suppose we want to add the value ‘7’ to my_array:
my_array = Array( 5)
numitems = 0
my_array[numitems] = 7 # Add 7 to first empty index
numitems += 1 # update numitems to indicate an item has been added to my_array
● The number of valid items in an Array always conveniently equals the first empty
index in my_array.
○ i.e. if there are 0 items in an array, the first empty index is 0, etc…
○ We can use numitems to index into the array when adding an item. After
incrementing it, it still equals the first empty index.
Chapter 4 Arrays and Linked Structures 24
The __getitem__ Method
● The client should be able to use the subscript operator [ ] to retrieve an element from
an Array object.
● For example, consider these two lines of code:
second parameter indexes the value
my_array = Array( 3, 5)
print(my_array[ 2]) # Should print 5
● The first statement creates the following array: 0 1 2
5 5 5
● The second statement should retrieve the element at index 2 and then pass it to the
print function.
Chapter 4 Arrays and Linked Structures 25
The __getitem__ Method
● When Python sees the [ ] operator, it calls __getitem__ on the calling object.
○ The value inside the brackets is passed as the argument to the method.
● i.e. this statement calls __getitem__ on my_array and passes 2:
my_array[ 2]
● In our Array class, how should we override this method?
def __getitem__ (self, index):
???
Chapter 4 Arrays and Linked Structures 26
The __getitem__ Method
● When Python sees the [ ] operator, it calls __getitem__ on the calling object.
○ The value inside the brackets is passed as the argument to the method.
● i.e. this statement calls __getitem__ on my_array and passes 2:
my_array[ 2]
● In our Array class, how should we override this method?
def __getitem__ (self, index):
""" Returns the item at index 'index' """
return self._items[index]
● We simply return the item at the given index in the internal list.
Chapter 4 Arrays and Linked Structures 27
The __setitem__ Method
● When Python sees the [ ] operator, and it appears on the left side of an assignment
statement, the __setitem__ method is called on the calling object
● The value in brackets is passed as the first argument and the value to the right of the
assignment operator is passed as the second argument.
● i.e. this statement calls __setitem__ on my_array and passes 1 and 7 as arguments:
my_array[ 1] = 7 # Should set index 1 to contain '7'
● Let’s override this method in our Array class:
def __setitem__ (self, index, new_item):
???
Chapter 4 Arrays and Linked Structures 28
The __setitem__ Method
● When Python sees the [ ] operator, and it appears on the left side of an assignment
statement, the __setitem__ method is called on the calling object
● The value in brackets is passed as the first argument and the value to the right of the
assignment operator is passed as the second argument.
● i.e. this statement calls __setitem__ on my_array and passes 1 and 7 as arguments:
my_array[ 1] = 7 # Should set index 1 to contain '7'
● Let’s override this method in our Array class:
def __setitem__ (self, index, new_item):
self._items[index] = new_item
● We simply assign new_item to the given index in the Array’s internal list.
Chapter 4 Arrays and Linked Structures 29
The Array Class
● Consider the following code. What will print?
my_array = Array( 5)
my_array[0] = 7
my_array[1] = 10
my_array[2] = 15
for item in my_array:
print(item, end=" ")
Chapter 4 Arrays and Linked Structures 30
The Array Class
● Consider the following code. What will print? 7 10 15 None None
my_array = Array( 5)
my_array[0] = 7
my_array[1] = 10
my_array[2] = 15 0 1 2 3 4
for item in my_array: 7 10 15 None None
print(item, end=" ")
● Recall that upon creation, each index in our array is logically empty.
○ The logical size of my_array is 3...we don’t want to print indexes 3 and 4 here.
● How can we iterate through my_array without printing invalid items?
Chapter 4 Arrays and Linked Structures 31
The Array Class
● This code will now print: 7 10 15
my_array = Array( 5)
numitems = 0
my_array[numitems] = 7 0 1 2 3 4
numitems += 1
my_array[numitems] = 10
7 10 15 None None
numitems += 1
my_array[numitems] = 15
numitems += 1
for index in range(numitems):
print(my_array[index], end=" ")
● The variable numitems keeps track of the array’s logical size.
● The for loop now iterates from 0 to 2.
○ Note: Since we now iterate over a Range object, we have to index into the array.
Chapter 4 Arrays and Linked Structures 32
The Array Class
● Suppose we want to write a function print_array.
● Its should print every item in the parameter the_array.
def print_array(the_array):
???
● What should its body look like?
Chapter 4 Arrays and Linked Structures 33
The Array Class
● Suppose we want to write a function print_array.
● Its should print every item in the parameter the_array.
def print_array(the_array, size):
for index in range(size):
print(the_array[index], end=" ")
● Because an array’s logical size might differ from its physical size, we need a
parameter that indicates the number of valid indexes.
● If we simply said for index in the_array, invalid data might be printed.
Chapter 4 Arrays and Linked Structures 34
The Array Class
a = Array( 5) # Creates an array with 5 3s in it
numitems = 0
a[0] = 5
numitems += 1
a[1] = 3
numitems += 1
a[0] = 12
for i in range(numitems):
print(a[i])
print(len(a))
● The client can now create an Array, add items to it, replace an existing item, retrieve
items, and retrieve its physical size.
● Our Array data structure is done!
● As can be seen, an Array is a lot simpler than a list.
Chapter 4 Arrays and Linked Structures 35
Operations on Arrays
● While working with Arrays, the client can write code to add a little more functionality
to how they work.
○ Note: Many of the techniques we discuss here will appear in future data structures
that we design.
● For the following slides, assume the following three lines of code are written at the top
of our code:
DEFAULT_CAPACITY = 5
logical_size = 0
a = Array(DEFAULT_CAPACITY)
Chapter 4 Arrays and Linked Structures 36
Operations on Arrays
● Consider the following code. Can anything go wrong?
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
Chapter 4 Arrays and Linked Structures 37
Operations on Arrays
● Consider the following code. Can anything go wrong? Yes!
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
● Suppose the user enters the numbers 1, 2, 3, 4, 5, and 6.
● The first five numbers will be placed in a. The sixth number will cause an
out-of-bounds error!
a
0 1 2 3 4 5
1 2 3 4 5 ???
Chapter 4 Arrays and Linked Structures 38
Increasing the Size of an Array
● Recall that an Array object cannot be resized.
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
● However, when logical_size equals the Array’s length, the Array is full.
● In this loop, we need to increase the size of the Array at this point.
● As the client, how can we simulate the increasing of an Array’s size?
Chapter 4 Arrays and Linked Structures 39
Increasing the Size of an Array
● Recall that an Array object cannot be resized.
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
● However, when logical_size equals the Array’s length, the Array is full.
● In this loop, we need to increase the size of the Array at this point.
● As the client, how can we simulate the increasing of an Array’s size?
1. Create a new Array object named temp that is larger.
2. Place the values from a into it.
3. Assign the new Array object to a.
Chapter 4 Arrays and Linked Structures 40
Increasing the Size of an Array
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
● Suppose we have an Array of 3 indexes referenced by a.
● When it is full, we want to increase its size.
0 1 2
a
5 7 2
Chapter 4 Arrays and Linked Structures 41
Increasing the Size of an Array
1. Create a new Array object named temp that is larger than a.
● In the first step, we create a new Array with 4 indexes.
○ i.e. one more index than the Array that a references.
● The variable temp references it.
0 1 2
a
5 7 2
0 1 2 3
temp
None None None None
Chapter 4 Arrays and Linked Structures 42
Increasing the Size of an Array
2. Place the values from a into temp.
● In the second step, we iterate through a and copy its items to the first three indexes of
temp.
● Note that temp still has one empty index.
0 1 2
a
5 7 2
0 1 2 3
temp
5 7 2 None
Chapter 4 Arrays and Linked Structures 43
Increasing the Size of an Array
3. Assign the new Array object to a. The old Array will be garbage collected.
● In the third step, we assign the new Array to a.
● It is as if we increased the size of a!
0 1 2
a
5 7 2
0 1 2 3
temp
5 7 2 None
Chapter 4 Arrays and Linked Structures 44
Increasing the Size of an Array
Let’s perform these same steps in Python code.
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
if len(a) == logical_size:
???
Chapter 4 Arrays and Linked Structures 45
Increasing the Size of an Array
Let’s perform these same steps in Python code.
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
if len(a) == logical_size:
temp = Array( len(a) + 1)
Chapter 4 Arrays and Linked Structures 46
Increasing the Size of an Array
Let’s perform these same steps in Python code.
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
if len(a) == logical_size:
temp = Array( len(a) + 1)
???
Chapter 4 Arrays and Linked Structures 47
Increasing the Size of an Array
Let’s perform these same steps in Python code.
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
if len(a) == logical_size:
temp = Array( len(a) + 1)
for i in range(logical_size):
temp[i] = a[i]
Chapter 4 Arrays and Linked Structures 48
Increasing the Size of an Array
Let’s perform these same steps in Python code.
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
if len(a) == logical_size:
temp = Array( len(a) + 1)
for i in range(logical_size):
temp[i] = a[i]
???
Chapter 4 Arrays and Linked Structures 49
Increasing the Size of an Array
Let’s perform these same steps in Python code.
1. Create a new Array object named temp that is larger than a.
2. Place the values from a into temp.
3. Assign the new Array object to a. The old Array will be garbage collected.
if len(a) == logical_size:
temp = Array( len(a) + 1)
for i in range(logical_size):
temp[i] = a[i]
a = temp
Chapter 4 Arrays and Linked Structures 50
Increasing the Size of an Array
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
# If array is full, resize it
if len(a) == logical_size:
temp = Array( len(a) + 1)
for i in range(logical_size):
temp[i] = a[i]
a = temp
● This loop now allows the user to enter as many numbers as they want.
● When the array is full, we create a new array that is one bigger to replace it.
● Is there anything about this process that might be inefficient?
Chapter 4 Arrays and Linked Structures 51
Increasing the Size of an Array
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
# If array is full, resize it
if len(a) == logical_size:
temp = Array( len(a) + 1)
for i in range(logical_size):
temp[i] = a[i]
a = temp
● Is there anything about this process that might be inefficient? Yes!
● If we resize the Array by 1 when it’s full, it’ll have to resize every time a new item is
added.
● By convention, we usually double the size of an array when resizing.
Chapter 4 Arrays and Linked Structures 52
Increasing the Size of an Array
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
a[logical_size] = num
logical_size += 1
# If array is full, resize it
if len(a) == logical_size:
temp = Array( len(a) * 2)
for i in range(logical_size):
temp[i] = a[i]
a = temp
● This code now doubles the size of the array when needed, so we create new arrays
and copy values over much less frequently.
Chapter 4 Arrays and Linked Structures 53
Increasing the Size of an Array
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
add_item(a, logical_size, num)
● We can simplify our code even more!
● Let’s write a function named add_item that takes an array, a logical size, and an item
to add to that array.
● We can then simply call this function in each iteration of the loop.
Chapter 4 Arrays and Linked Structures 54
Increasing the Size of an Array
def add_item(array, logical_size, item):
""" add 'item' to 'array'. If array is full, double its size"""
if len(array) == logical_size:
print("Resizing")
temp = Array( len(array) * 2)
for i in range(logical_size):
temp[i] = array[i]
array = temp
array[logical_size] = item
logical_size += 1
● Is there anything that won’t work with this code?
Chapter 4 Arrays and Linked Structures 55
Increasing the Size of an Array
def add_item(array, logical_size, item):
""" add 'item' to 'array'. If array is full, double its size """
if len(array) == logical_size:
print("Resizing")
temp = Array( len(array) * 2)
for i in range(logical_size):
temp[i] = array[i]
array = temp
array[logical_size] = item
logical_size += 1
● Is there anything that won’t work with this code? Yes!
● Python is pass-by-value! This means that although we change array and
logical_size in add_item, their corresponding arguments are not changed.
● How can we fix this?
Chapter 4 Arrays and Linked Structures 56
Increasing the Size of an Array
def add_item(array, logical_size, item):
""" add 'item' to 'array'. If array is full, double its size"""
if len(array) == logical_size:
print("Resizing")
temp = Array( len(array) * 2)
for i in range(logical_size):
temp[i] = array[i]
array = temp
array[logical_size] = item
logical_size += 1
return logical_size, array
● Conveniently, we can return two values from a function in Python.
● When calling add_item, we can assign the return values to the argument variables:
while True:
num = input("Enter a number, or 'q' to quit: ")
if num == 'q':
break
logical_size, a = add_item(a, logical_size, num)
Chapter 4 Arrays and Linked Structures 57
The Array Data Structure
● If an Array gets really full and then really empty, have we introduced any problems?
Chapter 4 Arrays and Linked Structures 58
The Array Data Structure
● If an Array gets really full and then really empty, have we introduced any problems?
● Yes! The Array’s physical size has significantly increased, but its logical size is now
small. We are wasting memory!
● Let’s write a function called remove_item which takes an array and a variable with
its logical size.
● The function will remove the final item in the array.
● If the logical size is ever half of the physical size, we’ll decrease the size of the array
by half.
Chapter 4 Arrays and Linked Structures 59
The Array Data Structure
def remove_item(array, logical_size):
""" Removes the final item in the array """
if logical_size == 0:
raise IndexError ("Array is empty")
array[logical_size - 1] = None
logical_size -= 1
if logical_size <= len(array) // 2:
# Resize the array (cut it by half)
temp = Array( len(array) // 2)
for i in range(logical_size):
temp[i] = array[i]
array = temp
return logical_size, array
● Recall that logical_size is the index of the first logically empty index.
● Therefore, logical_size - 1 is the index of the last logically full index.
Chapter 4 Arrays and Linked Structures 60