KEMBAR78
Python Guide for Data Scientists | PDF | Parameter (Computer Programming) | Filename
0% found this document useful (0 votes)
125 views15 pages

Python Guide for Data Scientists

This document provides a summary of key concepts in the Python programming language including basic language syntax, data types, functions, classes, packages, files, and popular Python libraries like NumPy, Matplotlib, Sympy, Scipy, and Pandas. It covers topics such as basic and compound data types, conditionals and loops, functions, classes and object-oriented programming, importing and using packages, reading and writing files, and introducing common array operations and creation in NumPy.

Uploaded by

beckerrolandh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views15 pages

Python Guide for Data Scientists

This document provides a summary of key concepts in the Python programming language including basic language syntax, data types, functions, classes, packages, files, and popular Python libraries like NumPy, Matplotlib, Sympy, Scipy, and Pandas. It covers topics such as basic and compound data types, conditionals and loops, functions, classes and object-oriented programming, importing and using packages, reading and writing files, and introducing common array operations and creation in NumPy.

Uploaded by

beckerrolandh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Python(3) leaflet

Roland Becker
December 16, 2020

Contents
1 Basic language 1

2 Classes 4

3 Packages 5

4 Files 5

5 Numpy (basics) 6

6 Matplotib 8

7 Sympy 13

8 Scipy 13

9 Pandas 14

10 tkinter 15

1 Basic language
Reserved keywords (complete)

and, as, assert, break, class, continue, def, del, elif, else, except, False,
finally, for, from, global, if, import, in, is, lambda, None, nonlocal, not, or,
pass, raise, return, True, try, while, with, yield

Errors/Exceptions (incomplete)

AssertionError assert statement fails


AttributeError attribute reference or assignment fails
typically the attribute (data or function) in a class does not exist
IndexError sequence subscript is out of range
for example l[1] for a list l if len(l)=1 ...
KeyError dictionary key is not found
NameError local or global name is not found
variable of name does’nt exist, you probably confound the name, or have renamed it...
SyntaxError the parser encounters a syntax error
TypeError operation or function is applied to an object of inappropriate type
ValueError operation or function receives an argument that has the right type but an inappropriate
value
ZeroDivisionError Raised when the second argument of a division or modulo operation is zero

1
Basic Datatypes

a) bool, int, float, complex, str, None


b) n/m is float division, n//m is integer division

Basic Datatypes: str

a) substrings: ssub = s[3:23:3] by slicing [start:stop:end]


b) raw string: r"$\int_0ˆ1 f(x)\,dx$" useful for latex in matplotlib

 08d: fill with 0, width 8, int
8.2f: width 8, precision 2, float a
c) format: "x={:X}".format(value) X=

ˆ8s: centered, width 8, str

d) shorthand format string: f"value of x is {x}"


e) combined with ’=’ we even can do: print(f"{x=}")
a https://pyformat.info

Compound Datatypes

a) list [], tuple (), dictionary {}, set


b) list is mutable, wheeras tuple is immutable
c) Access operator is []: with integer for list/tuple and key for dict

Compound Datatypes: list

a) sublist: lsub = l[3:23:3]

b) list comprehension: a = [k**2 for k in range(20)]


c) string to list: l = list("abcdefghijklmnopqrstuvwxyz")
d) string to list: l = "a b c".split(’ ’)
e) list to string: a = ’’.join(l)

f) list comprehension: [i.upper() for i in l]


g) list comprehension with if: [f(x) for x in sequence if condition]
h) list comprehension with if and else: [f(x) if condition else g(x) for x in sequence]

Compound Datatypes: dict

a) loop over dictionary: for k,v in d.items():

2
Built-in functions

a) type(a) gives the type of a


b) id(a) gives memory a
c) len(a) gives the length of a

d) sum(a) gives the sum of a


e) max(a [, key=f]) gives the max of a, optional key is the function defining the order
f) print("Hello", end=" world") #no new line
g) int(a) convert to int, similar for float, str

h) sorted(a)
i) hasattr, getattr, setattr
j) map, set, zip, enumerate

Conditionals

if condition:
do someting
elif condition2:
do someting2
else:
do someting3

Loops

a) Loop over list

for l in some list:


do someting with l

b) Loop over dictionary

for k,v in some dict.items():


do someting with k and v

or

for k in some dict:


do someting with k and some dict[k]

3
Functions

a) general definition

def fct1(input1, input2):


return output1, output2 # tuple for multiple return values
# default arguments last:
def fct2(input1, input2=1, input3=None):
pass # do nothing
# call with keyword (named) arguments
fct2(input1=some_value)

b) keyword arguments

# variable number of args - positional arguments


def my_sum1(*args): return sum(args)
print(my_sum1(1,2,3,4))
# variable number of args - keyword arguments
def my_sum2(**kwargs):
return f"sum of {list(kwargs.keys()) is sum(kwargs.values())}"
print(my_sum2(first=1 ,second=2, third=3))
# typical usage
def f(a, b, *args, **kwargs):
if ’date’ in kwargs: date = kwargs.pop(’date’)

c) lambda functions: lambda x,y: x**2-y**2

d) avoid using mutables as default arguments

2 Classes
Basics

class ParentClass():
def __init__(self, arg):
self.arg = arg
def __repr__(self): return "ParentClass" #used in ’print’
def compute(self): return self.arg**2
class Class(ParentClass):
def __init__(self, arg, arg2):
ParentClass.__init__(self, arg) # don’t forget ’self’
# or super().__init__(arg) no ’self’, allows for mulit-inheritance
self.arg2 = arg2
def __call__(self): return self.compute()+self.arg2
cls = Class(arg1, arg2)
vars(cls) # gets all elements of instance \python{cls}
cls() # calls __call__
# set an attribute by variable in a class
setattr(self, name, value)
# check if a class has an attribute by variable name
hasattr(self, name)

4
Advanced

a) Don’t use del . Instead consider

import atexit
def __init___(self):
# usual stuff
atexit.register(self.function_at_delete)

3 Packages
Import

You can import


a) A local file
b) An own package

c) A third-party package (for example installed with pip)

import something
# ’something’ can be the name of a local file ’something.py’
# or a package...
from something import somepart
# ’somepart’ can be a function, int or else in file ’something.py’
import numpy as np # giving a short name
import matplotlib.pyplot as plt # import a sub-module

Possible problems

When you import a package, it can do different things (specified in the file init .py in that package), for example
in can include subpackages or not

import scypy
scipy.integrate.solve_ivp() # will not work, need to import scipy.integrate

4 Files
For reading writing special file formats there are built-in libraries as wav, aifc, zipfile, PyPDF2, xlwings (excel),
Pillow (images).
Read

# filename can be relative ("test.txt") or absolute ("C:/Python/test.txt")


f = open(filename) # sames as f = open(filename, ’r’)
contents = f.read()
line = f.readline() # only one line
lines = f.readlines() # all lines as a list
for line in f: print(line) # iterate over (lines of) file
f.close()

5
Write

# file will be closed automatically


with open(filename, ’w’) as f: # for append f = open(filename, ’a’)
f.write("All erased!")

File and directory manipulations

import glob
filenames = glob.glob(’*.gif’) # show all gif-files
# alternative
import pathlib
p = Path(’.’)
filenames = p.glob(’*.gif’)
# delete a file
import os
if os.path.exists(filename): os.remove(filename)
else: print("The file does not exist")
# delete a folder
os.rmdir(foldername)

5 Numpy (basics)
(nd)array

ndarray is a n-dimensional array (rectangle in n dimensions) with homogenous data. The dimensions are called
axes. Best reference: numpy.org.

# a is a np.ndarray
a.dtype # datatype
a.data # data
a.ndim # number of axes
a.shape # tuple of integers, length of each axes
# if a is a matrix with n rows and m columns, a.shape = (n, m)
len(a.shape) = a.ndim # always true
a = np.arange(15).reshape(3, 5)
# look at the result, list index runs first

6
Creation

x =
np.array( some_list)
x =
np.empty( (2,3) ) # 2x3 matrix
y =
x # Attention this does NOT create a new vector, just a view
y =
np.empty_like( x )# same size as x
x =
np.zeros( (2,3) )
y =
np.zeros_like(x)
y =
np.array(x)
x =
np.random.random(shape=(2,3)) # uniform in [0,1]
x =
np.random.randn(2,3) # normal distributed with mu,sigma
x =
np.random.randint(11,22, (2,3)) # integer uniform in [11,22]
x =
np.loadtxt(filename) # np.savetxt(x, filename) for the other way
x =
np.genfromtxt(filename/url) # great flexibility, for url automatic download
x =
np.arange(a, b, dx) # generates 1D-array equally spaced with step dx=1 by
default
x = np.linspace(a, b, n) # same but with the number of elements

Simple operations

a.T # transpose
a+b # if a.dtype != b.dtype, the more precise will be used (upcasting)
a*b # Attention: elementwise product!
a.dot(b) # matrix product, same as a@b
np.sin(a)
a >= 36 # results in boolean array of same shape
a.min()
a.sum() # if a.ndim=2 (matrix) a.sum(axis=0/1) results in colum/row-wise sum
a.cumsum()
a.mean()

Access operators (indexing and slicing)

x[1,2] # single element, ’,’ separates dimensions


x[0:-1,1::2] # slicing start:stop:step (0 is first and -1 last)
x[ind,:] # fancy indexing, ind is an integer list/1D-array
x[cond,:] # fancy indexing, cond is a boolean list/1D-array with cond.shape[0] ==
x.shape[0]
cond = (x[:, 2] > 1.5) & (x[:, 0] < 5.0) # uses broadcasting
x[cond] # gets all values between 1.5 and 5.0

Broadcasting

x = np.random.random((2,3))
y = np.random.random((2,1))
z = x + y # y will be broadcasted to shape (2,3) with identical columns
y = np.random.random((1,3))
z = x + y # y will be broadcasted to shape (2,3) with identical rows
z = x + np.pi # broadcating !

7
Shape manipulation

x.reshape(newshape) # has to make sense


x.ravel() # makes 1D-array, same as x.flat[:]
x.repeat(k) # repeat each element k-times, complex for >1 dimension
x.tile()

Functions

a) vectorization

x = np.random.random((2,3))
np.arcsin(x) # applies arcin to all elements of x

b) statistics

x.sum() # same as np.sum(x)


x.mean() # same as np.mean(x)

Statistics

np.histogram(x) # nomen est omen


np.bincount(x) # array of count of occurrences each positive int
np.searchsorted(x, v) # positions to insert v in (soted) x to maintain order

6 Matplotib
Styles, Markers, Linestyles

• colors: r,b,g,k,w,y,m,c...

• markers:x,X,+,*,o,v,1,2...
• linestyles: -,--,...

Simple plots (1)

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-np.pi, np.pi)
plt.plot(x, np.sin(x), label="sin") # plt.plot(np.sin(x)) plot index against sin
plt.plot(x, np.cos(x), label="cos") # adds another line
plt.legend()
plt.grid()
plt.xlabel("x")
plt.ylabel("y")
plt.show() # not needed in ipython, otherwise never forget...
plt.savefig("plot.png") # format is guessed

8
Simple plots (2)

# first lines as before


plots = plt.plot(x, np.sin(x), x, np.cos(x))
plt.setp(plots[0], label="sin", color=’r’, linewidth=2.0, linestyle=’-’, marker=’o’)
plt.setp(plots[1], label="cos", c=’b’, lw=3.0, ls=’--’, marker=’x’)
plt.xlim(left=0) # or plt.xlim(right=0) or plt.axis([xmin, xmax, ymin, ymax])
plt.legend()

Other current things

• Clear

plt.clf() # clear figure


plt.cla() # clear axes

• Equal scales in x and y

plt.gca().set_aspect(’equal’, adjustable=’box’)

• region between two curves

# fill region between two curves with color


plt.fill_betweenplot(x, ymin, ymax, color=’green’, alpha=0.5) # alpha is opacity

• curve fitting

from scipy.optimize import curve_fit


plt.plot(ns, data, label="original data")
# for quadratic fit
func = lambda x, a, b, c: a + b*x + c*x**2
ns = np.array(ns) # needs array (if ns was a list) for vectorized function call
p, pcov = curve_fit(func, ns, data)
label = f"cf {p[0]:5.1e}+{p[1]:5.1e}x + {p[2]:5.1e}x**2"
plt.plot(ns, func(ns,*p), ’--’, label=label)

Different scales

import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(-np.pi, np.pi)
fig, ax = plt.subplots()
axt = ax.twinx() # create a twin for different scale
p0, = ax.plot(t, np.sin(t), label="sin", color=’b’)
p1, = axt.plot(t, 100*np.cos(t), label="100*cos", color=’g’)
ax.tick_params(axis=’y’, labelcolor=’b’)
axt.tick_params(axis=’y’, labelcolor=’g’)
#axt.ticklabel_format(axis=’y’, style=’sci’, scilimits=(0,3)) # format if required
#axt.spines[’right’].set_position((’outward’, 0)) # offset if required
plt.legend(handles=[p0,p1]) # needed

9
Several plots (1)

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-np.pi, np.pi, 10)
plt.subplot(211) #nrows, ncols, fig !
plt.plot(x, np.sin(x), ’bo’, x, -x, ’k’)
plt.subplot(212)
plt.plot(x, np.cos(x), ’go-’, x, x, ’k’)

Several plots (2)

x = np.linspace(-1, 1)
fig, axarr = plt.subplots(2, 2, sharex=’col’)
axarr[1,0].set_xlabel(’x’)
axarr[1,1].set_xlabel(’x’)
axarr[0,0].set_ylabel(’angle’)
axarr[1,0].set_ylabel(’angle’)
plt.subplots_adjust(hspace=0.6)
plots = axarr[0,0].plot(x, np.arcsin(x), x, np.arccos(x))
plt.setp(plots[0], label="arcsin")
plt.setp(plots[1], label="arccos")
axarr[0,0].set_title("arcsin arccos")
axarr[0,0].legend(loc=’upper right’)
axarr[1,1].plot(x, np.arctan(x), label=’arctan’)
axarr[1,1].legend(loc=’best’)

2D contour plot

x = np.linspace(0, 1)
xm = 0.5*(x[1:] + x[0:-1])
y = np.linspace(-1, 1)
ym = 0.5*(y[1:] + y[0:-1])
xx, yy = np.meshgrid(x, y)
xc, yc = np.meshgrid(xm, ym)
#pseudocolor plot
p = plt.pcolormesh(xx, yy, np.sin(xx*yy), cmap=plt.cm.RdBu, shading=’auto’)
plt.colorbar(p)
# isolines
cnt = plt.contour(xc, yc, np.sin(xc*yc))
plt.clabel(cnt, cnt.levels, inline=True, fmt=’%.1f’, fontsize=10)
# plot contours (uses interpolation)
#cnt = plt.contourf(xc, yc, np.sin(xc*yc))

10
3D plot

from mpl_toolkits.mplot3d import Axes3D


x = np.linspace(0, 1)
y = np.linspace(-1, 1)
xx, yy = np.meshgrid(x, y)
fig = plt.figure(figsize=(12,6))
ax = fig.add_subplot(1, 2, 1, projection=’3d’)
ax.plot(x, y, np.sin(x*y), lw=2, c=’r’)
ax.plot_surface(xx, yy, np.sin(xx*yy), lw=0.5, alpha=0.5)

Synchronization of rotation and zoom

fig = plt.figure(figsize=(12,6))
ax1 = fig.add_subplot(1, 2, 1, projection=’3d’)
p = ax1.plot_surface(xx, yy, u0)
ax2 = fig.add_subplot(1, 2, 2, projection=’3d’, sharez=ax1)
p = ax2.plot_surface(xx, yy, u)
def on_move(event):
if event.inaxes == ax1:
if ax1.button_pressed in ax1._rotate_btn:
ax2.view_init(elev=ax1.elev, azim=ax1.azim)
elif ax1.button_pressed in ax1._zoom_btn:
ax2.set_xlim3d(ax1.get_xlim3d())
ax2.set_ylim3d(ax1.get_ylim3d())
ax2.set_zlim3d(ax1.get_zlim3d())
elif event.inaxes == ax2:
if ax2.button_pressed in ax2._rotate_btn:
ax1.view_init(elev=ax2.elev, azim=ax2.azim)
elif ax2.button_pressed in ax2._zoom_btn:
ax1.set_xlim3d(ax2.get_xlim3d())
ax1.set_ylim3d(ax2.get_ylim3d())
ax1.set_zlim3d(ax2.get_zlim3d())
else:
return
c1 = fig.canvas.mpl_connect(’motion_notify_event’, on_move)

11
Animation

a) Iterate over time (animate on the fly)

import matplotlib.animation as pltanim


class Animate:
def __init__(self, ax):
self.line, = ax.plot([], [], ’bo-’, lw=2, label="line")
ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
self.timetext = ax.text(0.05, 0.9, ’’, transform=ax.transAxes) # rel.
coords
def __call__(self, t):
r = 1 + np.cos(t)
self.line.set_data([0, r*np.cos(t)], [0, r*np.sin(t)])
self.timetext.set_text(f"time = {t:.2f}s")
return self.line, self.timetext # all animated artists
fig, ax = plt.subplots(1, 1, figsize=(6,6))
ani = Animate(ax)
t = np.linspace(0, 2*np.pi, 120)
anim = pltanim.FuncAnimation(fig, ani, t, interval=10, blit=True)
# interval controls the speed, repeat=False to stop

b) Iterate over frame (animate precomputed data)

import matplotlib.animation as pltanim


class Animate:
def __init__(self, ax):
self.line, = ax.plot([], [], ’bo-’, lw=2, label="line")
ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
self.timetext = ax.text(0.05, 0.9, ’’, transform=ax.transAxes)
t = np.linspace(0, 2*np.pi, 120)
self.t = t
r = 1 + np.cos(t)
self.x = r*np.cos(t)
self.y = r*np.sin(t)
def __call__(self, i):
self.line.set_data(self.x[:i], self.y[:i])
self.timetext.set_text(f"time = {self.t[i]:.2f}s")
return self.line, self.timetext
fig, ax = plt.subplots(1, 1, figsize=(6,6))
ani = Animate(ax, nframes)
anim = pltanim.FuncAnimation(fig, ani, nframes, interval=10, blit=True)

c) Save animation

writer = animation.PillowWriter(fps=30)
anim.save(’file.gif’, writer=writer)

12
Images

image = np.random.poisson(2., (80, 80))


p = plt.imshow(image)
plt.colorbar(p)

7 Sympy
Symbolic derivative

expr = "exp(-0.1*x**2)*sin(pi*x)"
f = sympy.lambdify(’x’, expr, "numpy") # makes numpy function
df_expr = sympy.diff(expr, ’x’)
df = sympy.lambdify(’x’, df_expr, "numpy")

8 Scipy
Optimization

f = lambda x: x**10
df = lambda x: 10*x**9
assert optimize.check_grad(f, df, x0=[-1,0,1])

ODEs

import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
def f(t, y): return [-y[0] + 2*y[1], -y[1]]
t = np.linspace(0, 2)
y0 = [1, 10]
y = integrate.odeint(f, y0, t, tfirst=True)
# Alternative: sol = integrate.solve_ivp(f, [t0, t1], y0, t_eval=t)
# assert np.allclose(y, sol.y)
plt.plot(t, y[:,0], label="y1")
plt.plot(t, y[:,1], label="y2")
plt.legend()
plt.show()

13
9 Pandas
Basics

Pandas is a database library based on numpy, similar to SQL. The basic dataype is pd.DataFrame, it is like a two-
dimensional array, but with varying types, or a dictionary with columns as keys. For each column, we have a row,
which is pd.Series.

import pandas as pd
df = pd.DataFrame({’col1’: [1, 2], ’col2’: [3, 4]}) # creates a DataFrame
df.columns # gets all columns
df = pd.read_excel(filename) # reads excel file
df.to_excel(filename, index=False) # writes to excel file, no index created

Access

df.iloc # integer based access


df.loc # name based access
df[colname] # gets the whole column
df.columns.get_loc(colname) # gets the index of column (hardly needed)
df.index[df[colname] == value].tolist() # gets the indices of rows varifying
condition

Operations

df.drop(df.columns[i], axis=1, inplace=True) # drops column i (could use name)


# adding a leading row
df2 = pd.DataFrame([values], columns=df.columns) # values is a list
df = pd.concat([df2, df], ignore_index=True)
# convert to str
df = dfin.applymap(str)

14
10 tkinter
Ingredients

This is the python interface to Tk, one of the first libraries for GUI (object-oriented!). Good reference: https:
//effbot.org/tkinterbook.

• Widgets: Frame, Label, Entry, Button, Text, Canvas, tkMessageBox, Toplevel,....


• Geometry manager: either pack or grid
• Bindings and events

A minimal example:

import tkinter as tk
root = tk.Tk() # create the main window
label = tk.Label(root, text="Hello tk")
label.pack() # only packed widgets are shown
button = tk.Button(root, text=’Quit’, command=root.destroy)
button.pack()
root.mainloop() # runs the event loop

Files and directories

from tkinter import filedialog, simpledialog


filename = filedialog.askopenfilename(initialdir=path)
filename = filedialog.asksaveasfilename(defaultextension=".xlsx", initialdir=path)
n = simpledialog.askinteger("Task", "Please give number")

Tricks

# passing arguments to functions in command inside a loop might be tricky (because


of ’late binding’)
# the following example does not work as expected
for c in range(ncols):
# e some widget
e.bind(’<Button-1>’, lambda event: self.action(c))
# hack using default arguments
e.bind(’<Button-1>’, lambda event, y=c: self.action(y))
# or use partial
from functools import partial
e.bind(’<Button-1>’, partial(self.action, c))

15

You might also like