ECE 20875
Python for Data Science
Qiang Qiu, Yi Ding and Aristides Carrillo
(Adapted from material developed by Profs. Milind Kulkarni, Stanley Chan, Chris
Brinton, David Inouye, and Qiang Qiu)
python basics
coding in python
• Standard Integrated Development Environments (IDEs)
• IDLE: Python’s own, basic IDE
• PyCharm: Code completion, unit tests, integration with git,
many advanced development features
(https://www.jetbrains.com/pycharm/)
• Spyder: Less plugins than PyCharm (not always a bad thing)
• Many more!
• Jupyter Notebook (https://jupyter.org/)
• Contains both computer code and rich text elements
(paragraphs, figures, …)
• Supports several dozen programming languages
• Very useful for data science development!
• You can download the notebook app or use Jupyter Hub
available on RCAC
(https://www.rcac.purdue.edu/compute/scholar)
• Anaconda package manager (https://www.anaconda.com/)
3
basic variables
• No "declaration" command as in other programming languages
• Variable is created when a value is assigned to it
• Can change type after they have been set
• Few rules on naming: Can make them very descriptive!
• Must start with a letter or underscore
• Case-sensitive (purdue & Purdue are different)
• Combinations (+) work on all types
“xyz ” + “abc” = “xyz abc”
3.2 + 1 = 4.2
4
operators and control statements
• Comparison operators: • Arithmetic operators:
a == b, a != b, a < b,
a + b, a - b, a * b,
a <= b, a > b, a >= b
a / b, a % b, a ** b, a//b
• If statement:
if r < 3: • Assignment operators:
print("x")
a = b, a += b, a -= b,
• If, elif, else (multiline blocks):
a *= b, a /= b, a **= b
if b > a:
print("b is greater than a") • Logical operators:
elif a == b:
print("a and b are equal") (a and b), (a or b),
else: not(a), not(a or b)
print("a is greater than b”)
5
lists
• One of the four collection data types • Length using len() method
• Also tuples, sets, and dictionaries print(len(thislist))
• Lists are ordered, changeable, and • Adding items to a list
allow duplicate members thislist.append(“orange”)
thislist = ["apple", "banana", "apple", thislist.insert(1, “orange”)
“cherry”]
• Removing items from a list
• Can pass in an integer index, or a thislist.remove(“banana”)
range of indexes thislist.pop(1)
thislist[0] => “apple"
• Defining lists with shorthand
thislist[-1] => “cherry” new_list = 5 * [0]
thislist[1:3] => [“banana”, “apple”] new_list = list(range(5))
6
loops (more control statements)
• while loop: Execute while for x in range(5,10):
condition is true y = x % 2
i = 1
print(y)
while i < 6:
print(i) • break: Stop a loop where it is
i += 1 and exit
• for loop: Iterate over a • continue: Move to next
sequence iteration of loop
for x in “banana”: for val in “sammy_the_dog”:
print(x) if val == “h”:
• range() operator can be a break
useful loop iterator: print(val)
7
lists in for loops
• In other programming languages, for • Can also iterate through a list of lists
loop variables are integers data_list = [[1,2],[2,6],[5,7]]
for point in data_list:
• In Python, can use any ‘iterable’ object [x,y] = point
fruits = ["apple", "banana", "cherry"] z = x ** 2
for x in fruits: print(x,y,z)
if x == "banana":
• Can use the range function to iterate
continue
through integers
print(x)
for x in range(2, 30, 3):
• Nested loops can be used too print(x)
adj = ["red", "big", "tasty"] • Can use a list to index another list
fruits = ["apple", "banana", "cherry"] ind = [1, 3, 5, 7]
for x in adj: values = [0] * 8
for y in fruits: for i in ind:
print(x, y) values[i] = i / 2
8
functions
• Block of code which runs when • To return a value, use the return
called statement
def my_function(x):
• Defined using def keyword
return 5 * x
def my_function():
print("Hello from a function”)
print(my_function(3))
• Call a function using its name print(my_function(5))
my_function()
• For multiple arguments, can use
• Parameters can be passed as keywords to specify order
input to functions def arithmetic(x,y,z):
def my_function(country): return (x+y)/z
print("I am from " + country)
print(arithmetic(z=3,x=2,y=4))
9
tuples
• Another of the four collection • Once a tuple is created, items cannot
data types be added or changed
• Tuples are ordered, • Workaround: Change to list, back to tuple
unchangeable, and allow • Check if item exists
duplicate members if "apple" in thistuple:
thistuple = print("Yes, 'apple' is in the
(“apple", "banana", “apple", “cherry”) fruits tuple")
• Indexed the same way as lists • Tuple with one item needs comma
thistuple = (“apple",) #Tuple
thistuple[0] => “apple"
thistuple = (“apple") #Not a tuple
thistuple[-1] => “cherry”
thistuple[1:3] => (“banana”,
• Built-in functions
thistuple.count(“apple")
“apple”) thistuple.index(“apple")
10
sets
• Collection which is unordered, (half) • Cannot change existing items, but can
changeable, and does not allow add and remove items
thisset.add(“orange")
duplicates
thisset.update(["orange", "mango",
• Written with curly brackets “grapes"])
thisset = {“apple”, "banana", thisset.remove("banana")
“cherry”} • Also have set operations just like
mathematical objects
• Cannot access items by index, but set1 = {"a", "b", "c"}
can loop through and check for set2 = {1, "b", 3}
items
for x in thisset: set1.union(set2) #Union
print(x) set1.intersection(set2) #Intersection
set1.difference(set2) #set1 \ set2
print("banana" in thisset)
set1.issubset(set2) #Testing if subset
11
dictionaries
• Collection which is ordered (as of recent • Can iterate through the keys, values, or both
Python versions), changeable, and for x in thisdict:
indexed print(thisdict[x])
• Also written with curly brackets, but for x in thisdict.values():
have keys and values print(x)
thisdict = { for x, y in thisdict.items():
"brand": "Ford", print(x, y)
"model": "Mustang", • Like other collections, can create a dictionary of
"year": 1964 dictionaries
child1 = {"name" : "Emil", "year" : 2004}
}
child2 = {"name" : "Tobias", "year" : 2007}
• Access/change/add values of items by child3 = {"name" : "Linus", "year" : 2011}
referring to the key name
myfamily = {"child1" : child1, "child2" : child2,
thisdict[“model"]
"child3" : child3}
thisdict[“year"] = 2019
• Use the copy method (not direct assignment)
thisdict[“color”] = “red” to make a copy of a dictionary
mydict = thisdict.copy()
12
version control
command line and bash
• Command Line Interface (CLI) for
interacting with your operating
system (OS)
• Unix shell: Available by default on
Linux and macOS
• Windows users:
https://www.howtogeek.com/249
966/how-to-install-and-use-the-
linux-bash-shell-on-windows-10/
• Bash script: Sequence of commands,
typically saved as .sh file
14
overview of version control
• Automatically keep old versions of code and/or documentation local server
• Can revert back to old versions
• Can see differences (“diffs”) between versions
pull
• Typically through maintenance of repository on a server
• Can sync up code between different machines
• Can share code updates across many people add
• “git”: One of the most popular version control systems
• Each “project” goes into a different “repository” commit
• Repositories can be public (e.g., homework assignments) or
private (e.g., homework solutions prior to the due date :D)
• We will use GitHub to manage assignments in this course push
15
git illustration
16
git illustration
17