KEMBAR78
PythonNotesForProfessionals (759 826) | PDF | Pointer (Computer Programming) | C++
0% found this document useful (0 votes)
15 views68 pages

PythonNotesForProfessionals (759 826)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views68 pages

PythonNotesForProfessionals (759 826)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Chapter 180: Pyglet

Pyglet is a Python module used for visuals and sound. It has no dependencies on other modules. See [pyglet.org][1]
for the official information. [1]: http://pyglet.org

Section 180.1: Installation of Pyglet


Install Python, go into the command line and type:

Python 2:

pip install pyglet

Python 3:

pip3 install pyglet

Section 180.2: Hello World in Pyglet


import pyglet
window = pyglet.window.Window()
label = pyglet.text.Label('Hello, world',
font_name='Times New Roman',
font_size=36,
x=window.width//2, y=window.height//2,
anchor_x='center', anchor_y='center')
@window.event
def on_draw():
window.clear()
label.draw()
pyglet.app.run()

Section 180.3: Playing Sound in Pyglet


sound = pyglet.media.load(sound.wav)
sound.play()

Section 180.4: Using Pyglet for OpenGL


import pyglet
from pyglet.gl import *

win = pyglet.window.Window()

@win.event()
def on_draw():
#OpenGL goes here. Use OpenGL as normal.

pyglet.app.run()

Section 180.5: Drawing Points Using Pyglet and OpenGL


import pyglet
from pyglet.gl import *

GoalKicker.com – Python® Notes for Professionals 734


win = pyglet.window.Window()
glClear(GL_COLOR_BUFFER_BIT)

@win.event
def on_draw():
glBegin(GL_POINTS)
glVertex2f(x, y) #x is desired distance from left side of window, y is desired distance from
bottom of window
#make as many vertexes as you want
glEnd

To connect the points, replace GL_POINTS with GL_LINE_LOOP.

GoalKicker.com – Python® Notes for Professionals 735


Chapter 181: Audio
Section 181.1: Working with WAV files
winsound

Windows environment

import winsound
winsound.PlaySound("path_to_wav_file.wav", winsound.SND_FILENAME)

wave

Support mono/stereo
Doesn't support compression/decompression

import wave
with wave.open("path_to_wav_file.wav", "rb") as wav_file: # Open WAV file in read-only mode.
# Get basic information.
n_channels = wav_file.getnchannels() # Number of channels. (1=Mono, 2=Stereo).
sample_width = wav_file.getsampwidth() # Sample width in bytes.
framerate = wav_file.getframerate() # Frame rate.
n_frames = wav_file.getnframes() # Number of frames.
comp_type = wav_file.getcomptype() # Compression type (only supports "NONE").
comp_name = wav_file.getcompname() # Compression name.

# Read audio data.


frames = wav_file.readframes(n_frames) # Read n_frames new frames.
assert len(frames) == sample_width * n_frames

# Duplicate to a new WAV file.


with wave.open("path_to_new_wav_file.wav", "wb") as wav_file: # Open WAV file in write-only
mode.
# Write audio data.
params = (n_channels, sample_width, framerate, n_frames, comp_type, comp_name)
wav_file.setparams(params)
wav_file.writeframes(frames)

Section 181.2: Convert any soundfile with python and mpeg


from subprocess import check_call

ok = check_call(['ffmpeg','-i','input.mp3','output.wav'])
if ok:
with open('output.wav', 'rb') as f:
wav_file = f.read()

note:

http://superuser.com/questions/507386/why-would-i-choose-libav-over-ffmpeg-or-is-there-even-a-difference
What are the differences and similarities between ffmpeg, libav, and avconv?

Section 181.3: Playing Windows' beeps


Windows provides an explicit interface through which the winsound module allows you to play raw beeps at a given
frequency and duration.

import winsound

GoalKicker.com – Python® Notes for Professionals 736


freq = 2500 # Set frequency To 2500 Hertz
dur = 1000 # Set duration To 1000 ms == 1 second
winsound.Beep(freq, dur)

Section 181.4: Audio With Pyglet


import pyglet
audio = pyglet.media.load("audio.wav")
audio.play()

For further information, see pyglet

GoalKicker.com – Python® Notes for Professionals 737


Chapter 182: pyaudio
PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily
use Python to play and record audio on a variety of platforms. PyAudio is inspired by:

1.pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.

2.tkSnack: cross-platform sound toolkit for Tcl/Tk and Python.

Section 182.1: Callback Mode Audio I/O


"""PyAudio Example: Play a wave file (callback version)."""

import pyaudio
import wave
import time
import sys

if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)

wf = wave.open(sys.argv[1], 'rb')

# instantiate PyAudio (1)


p = pyaudio.PyAudio()

# define callback (2)


def callback(in_data, frame_count, time_info, status):
data = wf.readframes(frame_count)
return (data, pyaudio.paContinue)

# open stream using callback (3)


stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True,
stream_callback=callback)

# start the stream (4)


stream.start_stream()

# wait for stream to finish (5)


while stream.is_active():
time.sleep(0.1)

# stop stream (6)


stream.stop_stream()
stream.close()
wf.close()

# close PyAudio (7)


p.terminate()

In callback mode, PyAudio will call a specified callback function (2) whenever it needs new audio data (to play)
and/or when there is new (recorded) audio data available. Note that PyAudio calls the callback function in a
separate thread. The function has the following signature callback(<input_data>, <frame_count>,
<time_info>, <status_flag>) and must return a tuple containing frame_count frames of audio data and a flag

GoalKicker.com – Python® Notes for Professionals 738


signifying whether there are more frames to play/record.

Start processing the audio stream using pyaudio.Stream.start_stream() (4), which will call the callback function
repeatedly until that function returns pyaudio.paComplete.

To keep the stream active, the main thread must not terminate, e.g., by sleeping (5).

Section 182.2: Blocking Mode Audio I/O


"""PyAudio Example: Play a wave file."""

import pyaudio
import wave
import sys

CHUNK = 1024

if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)

wf = wave.open(sys.argv[1], 'rb')

# instantiate PyAudio (1)


p = pyaudio.PyAudio()

# open stream (2)


stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)

# read data
data = wf.readframes(CHUNK)

# play stream (3)


while len(data) > 0:
stream.write(data)
data = wf.readframes(CHUNK)

# stop stream (4)


stream.stop_stream()
stream.close()

# close PyAudio (5)


p.terminate()

To use PyAudio, first instantiate PyAudio using pyaudio.PyAudio() (1), which sets up the portaudio system.

To record or play audio, open a stream on the desired device with the desired audio parameters using
pyaudio.PyAudio.open() (2). This sets up a pyaudio.Stream to play or record audio.

Play audio by writing audio data to the stream using pyaudio.Stream.write(), or read audio data from the stream
using pyaudio.Stream.read(). (3)

Note that in “blocking mode”, each pyaudio.Stream.write() or pyaudio.Stream.read() blocks until all the
given/requested frames have been played/recorded. Alternatively, to generate audio data on the fly or immediately
process recorded audio data, use the “callback mode”(refer the example on call back mode)

GoalKicker.com – Python® Notes for Professionals 739


Use pyaudio.Stream.stop_stream() to pause playing/recording, and pyaudio.Stream.close() to terminate the
stream. (4)

Finally, terminate the portaudio session using pyaudio.PyAudio.terminate() (5)

GoalKicker.com – Python® Notes for Professionals 740


Chapter 183: shelve
Shelve is a python module used to store objects in a file. The shelve module implements persistent storage for
arbitrary Python objects which can be pickled, using a dictionary-like API. The shelve module can be used as a
simple persistent storage option for Python objects when a relational database is overkill. The shelf is accessed by
keys, just as with a dictionary. The values are pickled and written to a database created and managed by anydbm.

Section 183.1: Creating a new Shelf


The simplest way to use shelve is via the DbfilenameShelf class. It uses anydbm to store the data. You can use the
class directly, or simply call shelve.open():

import shelve

s = shelve.open('test_shelf.db')
try:
s['key1'] = { 'int': 10, 'float':9.5, 'string':'Sample data' }
finally:
s.close()

To access the data again, open the shelf and use it like a dictionary:

import shelve

s = shelve.open('test_shelf.db')
try:
existing = s['key1']
finally:
s.close()

print existing

If you run both sample scripts, you should see:

$ python shelve_create.py
$ python shelve_existing.py

{'int': 10, 'float': 9.5, 'string': 'Sample data'}

The dbm module does not support multiple applications writing to the same database at the same time. If you
know your client will not be modifying the shelf, you can tell shelve to open the database read-only.

import shelve

s = shelve.open('test_shelf.db', flag='r')
try:
existing = s['key1']
finally:
s.close()

print existing

If your program tries to modify the database while it is opened read-only, an access error exception is generated.
The exception type depends on the database module selected by anydbm when the database was created.

GoalKicker.com – Python® Notes for Professionals 741


Section 183.2: Sample code for shelve
To shelve an object, first import the module and then assign the object value as follows:

import shelve
database = shelve.open(filename.suffix)
object = Object()
database['key'] = object

Section 183.3: To summarize the interface (key is a string,


data is an arbitrary object):
import shelve

d = shelve.open(filename) # open -- file may get suffix added by low-level


# library

d[key] = data # store data at key (overwrites old data if


# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError
# if no such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)

flag = key in d # true if the key exists


klist = list(d.keys()) # a list of all existing keys (slow!)

# as d was opened WITHOUT writeback=True, beware:


d['xx'] = [0, 1, 2] # this works as expected, but...
d['xx'].append(3) # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!

# having opened d without writeback=True, you need to code carefully:


temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it

# or, d=shelve.open(filename,writeback=True) would let you just code


# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.

d.close() # close it

Section 183.4: Write-back


Shelves do not track modifications to volatile objects, by default. That means if you change the contents of an item
stored in the shelf, you must update the shelf explicitly by storing the item again.

import shelve

s = shelve.open('test_shelf.db')
try:
print s['key1']
s['key1']['new_value'] = 'this was not here before'
finally:
s.close()

s = shelve.open('test_shelf.db', writeback=True)
try:

GoalKicker.com – Python® Notes for Professionals 742


print s['key1']
finally:
s.close()

In this example, the dictionary at ‘key1’ is not stored again, so when the shelf is re-opened, the changes have not
been preserved.

$ python shelve_create.py
$ python shelve_withoutwriteback.py

{'int': 10, 'float': 9.5, 'string': 'Sample data'}


{'int': 10, 'float': 9.5, 'string': 'Sample data'}

To automatically catch changes to volatile objects stored in the shelf, open the shelf with writeback enabled. The
writeback flag causes the shelf to remember all of the objects retrieved from the database using an in-memory
cache. Each cache object is also written back to the database when the shelf is closed.

import shelve

s = shelve.open('test_shelf.db', writeback=True)
try:
print s['key1']
s['key1']['new_value'] = 'this was not here before'
print s['key1']
finally:
s.close()

s = shelve.open('test_shelf.db', writeback=True)
try:
print s['key1']
finally:
s.close()

Although it reduces the chance of programmer error, and can make object persistence more transparent, using
writeback mode may not be desirable in every situation. The cache consumes extra memory while the shelf is open,
and pausing to write every cached object back to the database when it is closed can take extra time. Since there is
no way to tell if the cached objects have been modified, they are all written back. If your application reads data
more than it writes, writeback will add more overhead than you might want.

$ python shelve_create.py
$ python shelve_writeback.py

{'int': 10, 'float': 9.5, 'string': 'Sample data'}


{'int': 10, 'new_value': 'this was not here before', 'float': 9.5, 'string': 'Sample data'}
{'int': 10, 'new_value': 'this was not here before', 'float': 9.5, 'string': 'Sample data'}

GoalKicker.com – Python® Notes for Professionals 743


Chapter 184: IoT Programming with
Python and Raspberry PI
Section 184.1: Example - Temperature sensor
Interfacing of DS18B20 with Raspberry pi

Connection of DS18B20 with Raspberry pi

You can see there are three terminal

1. Vcc
2. Gnd
3. Data (One wire protocol)

R1 is 4.7k ohm resistance for pulling up the voltage level

1. Vcc should be connected to any of the 5v or 3.3v pins of Raspberry pi (PIN : 01, 02, 04, 17).
2. Gnd should be connected to any of the Gnd pins of Raspberry pi (PIN : 06, 09, 14, 20, 25).

GoalKicker.com – Python® Notes for Professionals 744


3. DATA should be connected to (PIN : 07)

Enabling the one-wire interface from the RPi side

4. Login to Raspberry pi using putty or any other linux/unix terminal.

5. After login, open the /boot/config.txt file in your favourite browser.

nano /boot/config.txt

6. Now add the this line dtoverlay=w1–gpio to the end of the file.

7. Now reboot the Raspberry pi sudo reboot.

8. Log in to Raspberry pi, and run sudo modprobe g1-gpio

9. Then run sudo modprobe w1-therm

10. Now go to the directory /sys/bus/w1/devices cd /sys/bus/w1/devices

11. Now you will found out a virtual directory created of your temperature sensor starting from 28-********.

12. Go to this directory cd 28-********

13. Now there is a file name w1-slave, This file contains the temperature and other information like CRC. cat
w1-slave.

Now write a module in python to read the temperature

import glob
import time

RATE = 30
sensor_dirs = glob.glob("/sys/bus/w1/devices/28*")

if len(sensor_dirs) != 0:
while True:
time.sleep(RATE)
for directories in sensor_dirs:
temperature_file = open(directories + "/w1_slave")
# Reading the files
text = temperature_file.read()
temperature_file.close()
# Split the text with new lines (\n) and select the second line.
second_line = text.split("\n")[1]
# Split the line into words, and select the 10th word
temperature_data = second_line.split(" ")[9]
# We will read after ignoring first two character.
temperature = float(temperature_data[2:])
# Now normalise the temperature by dividing 1000.
temperature = temperature / 1000
print 'Address : '+str(directories.split('/')[-1])+', Temperature : '+str(temperature)

Above python module will print the temperature vs address for infinite time. RATE parameter is defined to change
or adjust the frequency of temperature query from the sensor.

GPIO pin diagram

1. [https://www.element14.com/community/servlet/JiveServlet/previewBody/73950-102-11-339300/pi3_gpio.pn

GoalKicker.com – Python® Notes for Professionals 745


g][3]

GoalKicker.com – Python® Notes for Professionals 746


VIDEO: Complete Python
Bootcamp: Go from zero
to hero in Python 3
Learn Python like a Professional! Start from the
basics and go all the way to creating your own
applications and games!

✔ Learn to use Python professionally, learning both Python 2 and Python 3!


✔ Create games with Python, like Tic Tac Toe and Blackjack!
✔ Learn advanced Python features, like the collections module and how to work with timestamps!
✔ Learn to use Object Oriented Programming with classes!
✔ Understand complex topics, like decorators.
✔ Understand how to use both the Jupyter Notebook and create .py files
✔ Get an understanding of how to create GUIs in the Jupyter Notebook system!
✔ Build a complete understanding of Python from the ground up!

Watch Today →
Chapter 185: kivy - Cross-platform Python
Framework for NUI Development
NUI : A natural user interface (NUI) is a system for human-computer interaction that the user operates through
intuitive actions related to natural, everyday human behavior.

Kivy is a Python library for development of multi-touch enabled media rich applications which can be installed on
different devices. Multi-touch refers to the ability of a touch-sensing surface (usually a touch screen or a trackpad)
to detect or sense input from two or more points of contact simultaneously.

Section 185.1: First App


To create an kivy application

1. sub class the app class


2. Implement the build method, which will return the widget.
3. Instantiate the class an invoke the run.

from kivy.app import App


from kivy.uix.label import Label

class Test(App):
def build(self):
return Label(text='Hello world')

if __name__ == '__main__':
Test().run()

Explanation

from kivy.app import App

The above statement will import the parent class app. This will be present in your installation directory
your_installtion_directory/kivy/app.py

from kivy.uix.label import Label

The above statement will import the ux element Label. All the ux element are present in your installation directory
your_installation_directory/kivy/uix/.

class Test(App):

The above statement is for to create your app and class name will be your app name. This class is inherited the
parent app class.

def build(self):

The above statement override the build method of app class. Which will return the widget that needs to be shown
when you will start the app.

return Label(text='Hello world')

The above statement is the body of the build method. It is returning the Label with its text Hello world.

GoalKicker.com – Python® Notes for Professionals 748


if __name__ == '__main__':

The above statement is the entry point from where python interpreter start executing your app.

Test().run()

The above statement Initialise your Test class by creating its instance. And invoke the app class function run().

Your app will look like the below picture.

GoalKicker.com – Python® Notes for Professionals 749


Chapter 186: Pandas Transform: Preform
operations on groups and concatenate the
results
Section 186.1: Simple transform
First, Let's create a dummy dataframe

We assume that a customer can have n orders, an order can have m items, and items can be ordered more
multiple times

orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']

# And this is how the dataframe looks like:


print(orders_df)
# customer_id order_id item
# 0 1 1 apples
# 1 1 1 chocolate
# 2 1 1 chocolate
# 3 1 2 coffee
# 4 1 2 coffee
# 5 2 3 apples
# 6 2 3 bananas
# 7 3 4 coffee
# 8 3 5 milkshake
# 9 3 6 chocolate
# 10 3 6 strawberry
# 11 3 6 strawberry

.
.

Now, we will use pandas transform function to count the number of orders per customer
# First, we define the function that will be applied per customer_id
count_number_of_orders = lambda x: len(x.unique())

# And now, we can transform each group using the logic defined above
orders_df['number_of_orders_per_cient'] = ( # Put the results into a new column that
is called 'number_of_orders_per_cient'
orders_df # Take the original dataframe
.groupby(['customer_id'])['order_id'] # Create a separate group for each
customer_id & select the order_id
.transform(count_number_of_orders)) # Apply the function to each group
separately

# Inspecting the results ...


print(orders_df)
# customer_id order_id item number_of_orders_per_cient
# 0 1 1 apples 2
# 1 1 1 chocolate 2
# 2 1 1 chocolate 2
# 3 1 2 coffee 2
# 4 1 2 coffee 2

GoalKicker.com – Python® Notes for Professionals 750


# 5 2 3 apples 1
# 6 2 3 bananas 1
# 7 3 4 coffee 3
# 8 3 5 milkshake 3
# 9 3 6 chocolate 3
# 10 3 6 strawberry 3
# 11 3 6 strawberry 3

Section 186.2: Multiple results per group


Using transform functions that return sub-calculations per group

In the previous example, we had one result per client. However, functions returning different values for the group
can also be applied.

# Create a dummy dataframe


orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']

# Let's try to see if the items were ordered more than once in each orders

# First, we define a function that will be applied per group


def multiple_items_per_order(_items):
# Apply .duplicated, which will return True is the item occurs more than once.
multiple_item_bool = _items.duplicated(keep=False)
return(multiple_item_bool)

# Then, we transform each group according to the defined function


orders_df['item_duplicated_per_order'] = ( # Put the results into a new column
orders_df # Take the orders dataframe
.groupby(['order_id'])['item'] # Create a separate group for each
order_id & select the item
.transform(multiple_items_per_order)) # Apply the defined function to each
group separately

# Inspecting the results ...


print(orders_df)
# customer_id order_id item item_duplicated_per_order
# 0 1 1 apples False
# 1 1 1 chocolate True
# 2 1 1 chocolate True
# 3 1 2 coffee True
# 4 1 2 coffee True
# 5 2 3 apples False
# 6 2 3 bananas False
# 7 3 4 coffee False
# 8 3 5 milkshake False
# 9 3 6 chocolate False
# 10 3 6 strawberry True
# 11 3 6 strawberry True

GoalKicker.com – Python® Notes for Professionals 751


Chapter 187: Similarities in syntax,
Dierences in meaning: Python vs.
JavaScript
It sometimes happens that two languages put different meanings on the same or similar syntax expression. When
the both languages are of interest for a programmer, clarifying these bifurcation points helps to better understand
the both languages in their basics and subtleties.

Section 187.1: `in` with lists


2 in [2, 3]

In Python this evaluates to True, but in JavaScript to false. This is because in Python in checks if a value is contained
in a list, so 2 is in [2, 3] as its first element. In JavaScript in is used with objects and checks if an object contains the
property with the name expressed by the value. So JavaScript considers [2, 3] as an object or a key-value map like
this:

{'0': 2, '1': 3}

and checks if it has a property or a key '2' in it. Integer 2 is silently converted to string '2'.

GoalKicker.com – Python® Notes for Professionals 752


Chapter 188: Call Python from C#
The documentation provides a sample implementation of the inter-process communication between C# and
Python scripts.

Section 188.1: Python script to be called by C# application


import sys
import json

# load input arguments from the text file


filename = sys.argv[ 1 ]
with open( filename ) as data_file:
input_args = json.loads( data_file.read() )

# cast strings to floats


x, y = [ float(input_args.get( key )) for key in [ 'x', 'y' ] ]

print json.dumps( { 'sum' : x + y , 'subtract' : x - y } )

Section 188.2: C# code calling Python script


using MongoDB.Bson;
using System;
using System.Diagnostics;
using System.IO;

namespace python_csharp
{
class Program
{
static void Main(string[] args)
{
// full path to .py file
string pyScriptPath = "...../sum.py";
// convert input arguments to JSON string
BsonDocument argsBson = BsonDocument.Parse("{ 'x' : '1', 'y' : '2' }");

bool saveInputFile = false;

string argsFile = string.Format("{0}\\{1}.txt", Path.GetDirectoryName(pyScriptPath),


Guid.NewGuid());

string outputString = null;


// create new process start info
ProcessStartInfo prcStartInfo = new ProcessStartInfo
{
// full path of the Python interpreter 'python.exe'
FileName = "python.exe", // string.Format(@"""{0}""", "python.exe"),
UseShellExecute = false,
RedirectStandardOutput = true,
CreateNoWindow = false
};

try
{
// write input arguments to .txt file
using (StreamWriter sw = new StreamWriter(argsFile))

GoalKicker.com – Python® Notes for Professionals 753


{
sw.WriteLine(argsBson);
prcStartInfo.Arguments = string.Format("{0} {1}", string.Format(@"""{0}""",
pyScriptPath), string.Format(@"""{0}""", argsFile));
}
// start process
using (Process process = Process.Start(prcStartInfo))
{
// read standard output JSON string
using (StreamReader myStreamReader = process.StandardOutput)
{
outputString = myStreamReader.ReadLine();
process.WaitForExit();
}
}
}
finally
{
// delete/save temporary .txt file
if (!saveInputFile)
{
File.Delete(argsFile);
}
}
Console.WriteLine(outputString);
}
}
}

GoalKicker.com – Python® Notes for Professionals 754


Chapter 189: ctypes
ctypes is a python built-in library that invokes exported functions from native compiled libraries.

Note: Since this library handles compiled code, it is relatively OS dependent.

Section 189.1: ctypes arrays


As any good C programmer knows, a single value won't get you that far. What will really get us going are arrays!

>>> c_int * 16
<class '__main__.c_long_Array_16'>

This is not an actual array, but it's pretty darn close! We created a class that denotes an array of 16 ints.

Now all we need to do is to initialize it:

>>> arr = (c_int * 16)(*range(16))


>>> arr
<__main__.c_long_Array_16 object at 0xbaddcafe>

Now arr is an actual array that contains the numbers from 0 to 15.

They can be accessed just like any list:

>>> arr[5]
5
>>> arr[5] = 20
>>> arr[5]
20

And just like any other ctypes object, it also has a size and a location:

>>> sizeof(arr)
64 # sizeof(c_int) * 16
>>> hex(addressof(arr))
'0xc000l0ff'

Section 189.2: Wrapping functions for ctypes


In some cases, a C function accepts a function pointer. As avid ctypes users, we would like to use those functions,
and even pass python function as arguments.

Let's define a function:

>>> def max(x, y):


return x if x >= y else y

Now, that function takes two arguments and returns a result of the same type. For the sake of the example, let's
assume that type is an int.

Like we did on the array example, we can define an object that denotes that prototype:

>>> CFUNCTYPE(c_int, c_int, c_int)

GoalKicker.com – Python® Notes for Professionals 755


<CFunctionType object at 0xdeadbeef>

That prototype denotes a function that returns an c_int (the first argument), and accepts two c_int arguments
(the other arguments).

Now let's wrap the function:

>>> CFUNCTYPE(c_int, c_int, c_int)(max)


<CFunctionType object at 0xdeadbeef>

Function prototypes have on more usage: They can wrap ctypes function (like libc.ntohl) and verify that the
correct arguments are used when invoking the function.

>>> libc.ntohl() # garbage in - garbage out


>>> CFUNCTYPE(c_int, c_int)(libc.ntohl)()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: this function takes at least 1 argument (0 given)

Section 189.3: Basic usage


Let's say we want to use libc's ntohl function.

First, we must load libc.so:

>>> from ctypes import *


>>> libc = cdll.LoadLibrary('libc.so.6')
>>> libc
<CDLL 'libc.so.6', handle baadf00d at 0xdeadbeef>

Then, we get the function object:

>>> ntohl = libc.ntohl


>>> ntohl
<_FuncPtr object at 0xbaadf00d>

And now, we can simply invoke the function:

>>> ntohl(0x6C)
1811939328
>>> hex(_)
'0x6c000000'

Which does exactly what we expect it to do.

Section 189.4: Common pitfalls


Failing to load a file

The first possible error is failing to load the library. In that case an OSError is usually raised.

This is either because the file doesn't exists (or can't be found by the OS):

>>> cdll.LoadLibrary("foobar.so")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

GoalKicker.com – Python® Notes for Professionals 756


File "/usr/lib/python3.5/ctypes/__init__.py", line 425, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.5/ctypes/__init__.py", line 347, in __init__
self._handle = _dlopen(self._name, mode)
OSError: foobar.so: cannot open shared object file: No such file or directory

As you can see, the error is clear and pretty indicative.

The second reason is that the file is found, but is not of the correct format.

>>> cdll.LoadLibrary("libc.so")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/ctypes/__init__.py", line 425, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.5/ctypes/__init__.py", line 347, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/i386-linux-gnu/libc.so: invalid ELF header

In this case, the file is a script file and not a .so file. This might also happen when trying to open a .dll file on a
Linux machine or a 64bit file on a 32bit python interpreter. As you can see, in this case the error is a bit more vague,
and requires some digging around.

Failing to access a function

Assuming we successfully loaded the .so file, we then need to access our function like we've done on the first
example.

When a non-existing function is used, an AttributeError is raised:

>>> libc.foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/ctypes/__init__.py", line 360, in __getattr__
func = self.__getitem__(name)
File "/usr/lib/python3.5/ctypes/__init__.py", line 365, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /lib/i386-linux-gnu/libc.so.6: undefined symbol: foo

Section 189.5: Basic ctypes object


The most basic object is an int:

>>> obj = ctypes.c_int(12)


>>> obj
c_long(12)

Now, obj refers to a chunk of memory containing the value 12.

That value can be accessed directly, and even modified:

>>> obj.value
12
>>> obj.value = 13
>>> obj
c_long(13)

GoalKicker.com – Python® Notes for Professionals 757


Since obj refers to a chunk of memory, we can also find out its size and location:

>>> sizeof(obj)
4
>>> hex(addressof(obj))
'0xdeadbeef'

Section 189.6: Complex usage


Let's combine all of the examples above into one complex scenario: using libc's lfind function.

For more details about the function, read the man page. I urge you to read it before going on.

First, we'll define the proper prototypes:

>>> compar_proto = CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int))


>>> lfind_proto = CFUNCTYPE(c_void_p, c_void_p, c_void_p, POINTER(c_uint), c_uint, compar_proto)

Then, let's create the variables:

>>> key = c_int(12)


>>> arr = (c_int * 16)(*range(16))
>>> nmemb = c_uint(16)

And now we define the comparison function:

>>> def compar(x, y):


return x.contents.value - y.contents.value

Notice that x, and y are POINTER(c_int), so we need to dereference them and take their values in order to actually
compare the value stored in the memory.

Now we can combine everything together:

>>> lfind = lfind_proto(libc.lfind)


>>> ptr = lfind(byref(key), byref(arr), byref(nmemb), sizeof(c_int), compar_proto(compar))

ptr is the returned void pointer. If key wasn't found in arr, the value would be None, but in this case we got a valid
value.

Now we can convert it and access the value:

>>> cast(ptr, POINTER(c_int)).contents


c_long(12)

Also, we can see that ptr points to the correct value inside arr:

>>> addressof(arr) + 12 * sizeof(c_int) == ptr


True

GoalKicker.com – Python® Notes for Professionals 758


VIDEO: Python for Data
Science and Machine
Learning Bootcamp
Learn how to use NumPy, Pandas, Seaborn,
Matplotlib , Plotly, Scikit-Learn , Machine Learning,
Tensorflow, and more!

✔ Use Python for Data Science and Machine Learning


✔ Use Spark for Big Data Analysis
✔ Implement Machine Learning Algorithms
✔ Learn to use NumPy for Numerical Data
✔ Learn to use Pandas for Data Analysis
✔ Learn to use Matplotlib for Python Plotting
✔ Learn to use Seaborn for statistical plots
✔ Use Plotly for interactive dynamic visualizations
✔ Use SciKit-Learn for Machine Learning Tasks
✔ K-Means Clustering
✔ Logistic Regression
✔ Linear Regression
✔ Random Forest and Decision Trees
✔ Neural Networks

Watch Today →
✔ Support Vector Machines
Chapter 190: Writing extensions
Section 190.1: Hello World with C Extension
The following C source file (which we will call hello.c for demonstration purposes) produces an extension module
named hello that contains a single function greet():

#include <Python.h>
#include <stdio.h>

#if PY_MAJOR_VERSION >= 3


#define IS_PY3K
#endif

static PyObject *hello_greet(PyObject *self, PyObject *args)


{
const char *input;
if (!PyArg_ParseTuple(args, "s", &input)) {
return NULL;
}
printf("%s", input);
Py_RETURN_NONE;
}

static PyMethodDef HelloMethods[] = {


{ "greet", hello_greet, METH_VARARGS, "Greet the user" },
{ NULL, NULL, 0, NULL }
};

#ifdef IS_PY3K
static struct PyModuleDef hellomodule = {
PyModuleDef_HEAD_INIT, "hello", NULL, -1, HelloMethods
};

PyMODINIT_FUNC PyInit_hello(void)
{
return PyModule_Create(&hellomodule);
}
#else
PyMODINIT_FUNC inithello(void)
{
(void) Py_InitModule("hello", HelloMethods);
}
#endif

To compile the file with the gcc compiler, run the following command in your favourite terminal:

gcc /path/to/your/file/hello.c -o /path/to/your/file/hello

To execute the greet() function that we wrote earlier, create a file in the same directory, and call it hello.py

import hello # imports the compiled library


hello.greet("Hello!") # runs the greet() function with "Hello!" as an argument

Section 190.2: C Extension Using c++ and Boost


This is a basic example of a C Extension using C++ and Boost.

GoalKicker.com – Python® Notes for Professionals 760


C++ Code

C++ code put in hello.cpp:

#include <boost/python/module.hpp>
#include <boost/python/list.hpp>
#include <boost/python/class.hpp>
#include <boost/python/def.hpp>

// Return a hello world string.


std::string get_hello_function()
{
return "Hello world!";
}

// hello class that can return a list of count hello world strings.
class hello_class
{
public:

// Taking the greeting message in the constructor.


hello_class(std::string message) : _message(message) {}

// Returns the message count times in a python list.


boost::python::list as_list(int count)
{
boost::python::list res;
for (int i = 0; i < count; ++i) {
res.append(_message);
}
return res;
}

private:
std::string _message;
};

// Defining a python module naming it to "hello".


BOOST_PYTHON_MODULE(hello)
{
// Here you declare what functions and classes that should be exposed on the module.

// The get_hello_function exposed to python as a function.


boost::python::def("get_hello", get_hello_function);

// The hello_class exposed to python as a class.


boost::python::class_<hello_class>("Hello", boost::python::init<std::string>())
.def("as_list", &hello_class::as_list)
;
}

To compile this into a python module you will need the python headers and the boost libraries. This example was
made on Ubuntu 12.04 using python 3.4 and gcc. Boost is supported on many platforms. In case of Ubuntu the
needed packages was installed using:

sudo apt-get install gcc libboost-dev libpython3.4-dev

Compiling the source file into a .so-file that can later be imported as a module provided it is on the python path:

GoalKicker.com – Python® Notes for Professionals 761


gcc -shared -o hello.so -fPIC -I/usr/include/python3.4 hello.cpp -lboost_python-py34 -lboost_system
-l:libpython3.4m.so

The python code in the file example.py:

import hello

print(hello.get_hello())

h = hello.Hello("World hello!")
print(h.as_list(3))

Then python3 example.py will give the following output:

Hello world!
['World hello!', 'World hello!', 'World hello!']

Section 190.3: Passing an open file to C Extensions


Pass an open file object from Python to C extension code.

You can convert the file to an integer file descriptor using PyObject_AsFileDescriptor function:

PyObject *fobj;
int fd = PyObject_AsFileDescriptor(fobj);
if (fd < 0){
return NULL;
}

To convert an integer file descriptor back into a python object, use PyFile_FromFd.

int fd; /* Existing file descriptor */


PyObject *fobj = PyFile_FromFd(fd, "filename","r",-1,NULL,NULL,NULL,1);

GoalKicker.com – Python® Notes for Professionals 762


Chapter 191: Python Lex-Yacc
PLY is a pure-Python implementation of the popular compiler construction tools lex and yacc.

Section 191.1: Getting Started with PLY


To install PLY on your machine for python2/3, follow the steps outlined below:

1. Download the source code from here.


2. Unzip the downloaded zip file
3. Navigate into the unzipped ply-3.10 folder
4. Run the following command in your terminal: python setup.py install

If you completed all the above, you should now be able to use the PLY module. You can test it out by opening a
python interpreter and typing import ply.lex.

Note: Do not use pip to install PLY, it will install a broken distribution on your machine.

Section 191.2: The "Hello, World!" of PLY - A Simple Calculator


Let's demonstrate the power of PLY with a simple example: this program will take an arithmetic expression as a
string input, and attempt to solve it.

Open up your favourite editor and copy the following code:

from ply import lex


import ply.yacc as yacc

tokens = (
'PLUS',
'MINUS',
'TIMES',
'DIV',
'LPAREN',
'RPAREN',
'NUMBER',
)

t_ignore = ' \t'

t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIV = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'

def t_NUMBER( t ) :
r'[0-9]+'
t.value = int( t.value )
return t

def t_newline( t ):
r'\n+'
t.lexer.lineno += len( t.value )

def t_error( t ):

GoalKicker.com – Python® Notes for Professionals 763


print("Invalid Token:",t.value[0])
t.lexer.skip( 1 )

lexer = lex.lex()

precedence = (
( 'left', 'PLUS', 'MINUS' ),
( 'left', 'TIMES', 'DIV' ),
( 'nonassoc', 'UMINUS' )
)

def p_add( p ) :
'expr : expr PLUS expr'
p[0] = p[1] + p[3]

def p_sub( p ) :
'expr : expr MINUS expr'
p[0] = p[1] - p[3]

def p_expr2uminus( p ) :
'expr : MINUS expr %prec UMINUS'
p[0] = - p[2]

def p_mult_div( p ) :
'''expr : expr TIMES expr
| expr DIV expr'''

if p[2] == '*' :
p[0] = p[1] * p[3]
else :
if p[3] == 0 :
print("Can't divide by 0")
raise ZeroDivisionError('integer division by 0')
p[0] = p[1] / p[3]

def p_expr2NUM( p ) :
'expr : NUMBER'
p[0] = p[1]

def p_parens( p ) :
'expr : LPAREN expr RPAREN'
p[0] = p[2]

def p_error( p ):
print("Syntax error in input!")

parser = yacc.yacc()

res = parser.parse("-4*-(3-5)") # the input


print(res)

Save this file as calc.py and run it.

Output:

-8

Which is the right answer for -4 * - (3 - 5).

GoalKicker.com – Python® Notes for Professionals 764


Section 191.3: Part 1: Tokenizing Input with Lex
There are two steps that the code from example 1 carried out: one was tokenizing the input, which means it looked
for symbols that constitute the arithmetic expression, and the second step was parsing, which involves analysing
the extracted tokens and evaluating the result.

This section provides a simple example of how to tokenize user input, and then breaks it down line by line.

import ply.lex as lex

# List of token names. This is always required


tokens = [
'NUMBER',
'PLUS',
'MINUS',
'TIMES',
'DIVIDE',
'LPAREN',
'RPAREN',
]

# Regular expression rules for simple tokens


t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'

# A regular expression rule with some action code


def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t

# Define a rule so we can track line numbers


def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)

# A string containing ignored characters (spaces and tabs)


t_ignore = ' \t'

# Error handling rule


def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)

# Build the lexer


lexer = lex.lex()

# Give the lexer some input


lexer.input(data)

# Tokenize
while True:
tok = lexer.token()
if not tok:
break # No more input
print(tok)

GoalKicker.com – Python® Notes for Professionals 765


Save this file as calclex.py. We'll be using this when building our Yacc parser.

Breakdown

1. Import the module using import ply.lex

2. All lexers must provide a list called tokens that defines all of the possible token names that can be produced
by the lexer. This list is always required.

tokens = [
'NUMBER',
'PLUS',
'MINUS',
'TIMES',
'DIVIDE',
'LPAREN',
'RPAREN',
]

tokens could also be a tuple of strings (rather than a string), where each string denotes a token as before.

3. The regex rule for each string may be defined either as a string or as a function. In either case, the variable
name should be prefixed by t_ to denote it is a rule for matching tokens.

For simple tokens, the regular expression can be specified as strings: t_PLUS = r'\+'

If some kind of action needs to be performed, a token rule can be specified as a function.

def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t

Note, the rule is specified as a doc string within the function. The function accepts one argument which
is an instance of LexToken, performs some action and then returns back the argument.

If you want to use an external string as the regex rule for the function instead of specifying a doc
string, consider the following example:

@TOKEN(identifier) # identifier is a string holding the regex


def t_ID(t):
... # actions

An instance of LexToken object (let's call this object t) has the following attributes:

1. t.type which is the token type (as a string) (eg: 'NUMBER', 'PLUS', etc). By default, t.type is set
to the name following the t_ prefix.
2. t.value which is the lexeme (the actual text matched)
3. t.lineno which is the current line number (this is not automatically updated, as the lexer knows
nothing of line numbers). Update lineno using a function called t_newline.

def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)

GoalKicker.com – Python® Notes for Professionals 766


4. t.lexpos which is the position of the token relative to the beginning of the input text.
If nothing is returned from a regex rule function, the token is discarded. If you want to discard a token,
you can alternatively add t_ignore_ prefix to a regex rule variable instead of defining a function for the
same rule.

def t_COMMENT(t):
r'\#.*'
pass
# No return value. Token discarded

...Is the same as:

t_ignore_COMMENT = r'\#.*'

This is of course invalid if you're carrying out some action when you see a comment. In which case, use
a function to define the regex rule.

If you haven't defined a token for some characters but still want to ignore it, use t_ignore =
"<characters to ignore>" (these prefixes are necessary):

t_ignore_COMMENT = r'\#.*'
t_ignore = ' \t' # ignores spaces and tabs

When building the master regex, lex will add the regexes specified in the file as follows:

1. Tokens defined by functions are added in the same order as they appear in the file.
2. Tokens defined by strings are added in decreasing order of the string length of the string
defining the regex for that token.

If you are matching == and = in the same file, take advantage of these rules.

Literals are tokens that are returned as they are. Both t.type and t.value will be set to the character
itself. Define a list of literals as such:

literals = [ '+', '-', '*', '/' ]

or,

literals = "+-*/"

It is possible to write token functions that perform additional actions when literals are matched.
However, you'll need to set the token type appropriately. For example:

literals = [ '{', '}' ]

def t_lbrace(t):
r'\{'
t.type = '{' # Set token type to the expected literal (ABSOLUTE MUST if this is a
literal)
return t

GoalKicker.com – Python® Notes for Professionals 767


Handle errors with t_error function.

# Error handling rule


def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1) # skip the illegal token (don't process it)

In general, t.lexer.skip(n) skips n characters in the input string.

4. Final preparations:

Build the lexer using lexer = lex.lex().

You can also put everything inside a class and call use instance of the class to define the lexer. Eg:

import ply.lex as lex


class MyLexer(object):
... # everything relating to token rules and error handling comes here as usual

# Build the lexer


def build(self, **kwargs):
self.lexer = lex.lex(module=self, **kwargs)

def test(self, data):


self.lexer.input(data)
for token in self.lexer.token():
print(token)

# Build the lexer and try it out

m = MyLexer()
m.build() # Build the lexer
m.test("3 + 4") #

Provide input using lexer.input(data) where data is a string

To get the tokens, use lexer.token() which returns tokens matched. You can iterate over lexer in a loop as
in:

for i in lexer:
print(i)

Section 191.4: Part 2: Parsing Tokenized Input with Yacc


This section explains how the tokenized input from Part 1 is processed - it is done using Context Free Grammars
(CFGs). The grammar must be specified, and the tokens are processed according to the grammar. Under the hood,
the parser uses an LALR parser.

# Yacc example

import ply.yacc as yacc

# Get the token map from the lexer. This is required.


from calclex import tokens

GoalKicker.com – Python® Notes for Professionals 768


def p_expression_plus(p):
'expression : expression PLUS term'
p[0] = p[1] + p[3]

def p_expression_minus(p):
'expression : expression MINUS term'
p[0] = p[1] - p[3]

def p_expression_term(p):
'expression : term'
p[0] = p[1]

def p_term_times(p):
'term : term TIMES factor'
p[0] = p[1] * p[3]

def p_term_div(p):
'term : term DIVIDE factor'
p[0] = p[1] / p[3]

def p_term_factor(p):
'term : factor'
p[0] = p[1]

def p_factor_num(p):
'factor : NUMBER'
p[0] = p[1]

def p_factor_expr(p):
'factor : LPAREN expression RPAREN'
p[0] = p[2]

# Error rule for syntax errors


def p_error(p):
print("Syntax error in input!")

# Build the parser


parser = yacc.yacc()

while True:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
result = parser.parse(s)
print(result)

Breakdown

Each grammar rule is defined by a function where the docstring to that function contains the appropriate
context-free grammar specification. The statements that make up the function body implement the semantic
actions of the rule. Each function accepts a single argument p that is a sequence containing the values of
each grammar symbol in the corresponding rule. The values of p[i] are mapped to grammar symbols as
shown here:

def p_expression_plus(p):
'expression : expression PLUS term'
# ^ ^ ^ ^
# p[0] p[1] p[2] p[3]

GoalKicker.com – Python® Notes for Professionals 769


p[0] = p[1] + p[3]

For tokens, the "value" of the corresponding p[i] is the same as the p.value attribute assigned in the lexer
module. So, PLUS will have the value +.

For non-terminals, the value is determined by whatever is placed in p[0]. If nothing is placed, the value is
None. Also, p[-1] is not the same as p[3], since p is not a simple list (p[-1] can specify embedded actions
(not discussed here)).

Note that the function can have any name, as long as it is preceded by p_.

The p_error(p) rule is defined to catch syntax errors (same as yyerror in yacc/bison).

Multiple grammar rules can be combined into a single function, which is a good idea if productions have a
similar structure.

def p_binary_operators(p):
'''expression : expression PLUS term
| expression MINUS term
term : term TIMES factor
| term DIVIDE factor'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
elif p[2] == '*':
p[0] = p[1] * p[3]
elif p[2] == '/':
p[0] = p[1] / p[3]

Character literals can be used instead of tokens.

def p_binary_operators(p):
'''expression : expression '+' term
| expression '-' term
term : term '*' factor
| term '/' factor'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
elif p[2] == '*':
p[0] = p[1] * p[3]
elif p[2] == '/':
p[0] = p[1] / p[3]

Of course, the literals must be specified in the lexer module.

Empty productions have the form '''symbol : '''

To explicitly set the start symbol, use start = 'foo', where foo is some non-terminal.

Setting precedence and associativity can be done using the precedence variable.

precedence = (
('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators

GoalKicker.com – Python® Notes for Professionals 770


('left', 'PLUS', 'MINUS'),
('left', 'TIMES', 'DIVIDE'),
('right', 'UMINUS'), # Unary minus operator
)

Tokens are ordered from lowest to highest precedence. nonassoc means that those tokens do not associate.
This means that something like a < b < c is illegal whereas a < b is still legal.

parser.out is a debugging file that is created when the yacc program is executed for the first time. Whenever
a shift/reduce conflict occurs, the parser always shifts.

GoalKicker.com – Python® Notes for Professionals 771


Chapter 192: Unit Testing
Section 192.1: Test Setup and Teardown within a
unittest.TestCase
Sometimes we want to prepare a context for each test to be run under. The setUp method is run prior to each test
in the class. tearDown is run at the end of every test. These methods are optional. Remember that TestCases are
often used in cooperative multiple inheritance so you should be careful to always call super in these methods so
that base class's setUp and tearDown methods also get called. The base implementation of TestCase provides
empty setUp and tearDown methods so that they can be called without raising exceptions:

import unittest

class SomeTest(unittest.TestCase):
def setUp(self):
super(SomeTest, self).setUp()
self.mock_data = [1,2,3,4,5]

def test(self):
self.assertEqual(len(self.mock_data), 5)

def tearDown(self):
super(SomeTest, self).tearDown()
self.mock_data = []

if __name__ == '__main__':
unittest.main()

Note that in python2.7+, there is also the addCleanup method that registers functions to be called after the test is
run. In contrast to tearDown which only gets called if setUp succeeds, functions registered via addCleanup will be
called even in the event of an unhandled exception in setUp. As a concrete example, this method can frequently be
seen removing various mocks that were registered while the test was running:

import unittest
import some_module

class SomeOtherTest(unittest.TestCase):
def setUp(self):
super(SomeOtherTest, self).setUp()

# Replace `some_module.method` with a `mock.Mock`


my_patch = mock.patch.object(some_module, 'method')
my_patch.start()

# When the test finishes running, put the original method back.
self.addCleanup(my_patch.stop)

Another benefit of registering cleanups this way is that it allows the programmer to put the cleanup code next to
the setup code and it protects you in the event that a subclasser forgets to call super in tearDown.

Section 192.2: Asserting on Exceptions


You can test that a function throws an exception with the built-in unittest through two different methods.

GoalKicker.com – Python® Notes for Professionals 772


Using a context manager

def division_function(dividend, divisor):


return dividend / divisor

class MyTestCase(unittest.TestCase):
def test_using_context_manager(self):
with self.assertRaises(ZeroDivisionError):
x = division_function(1, 0)

This will run the code inside of the context manager and, if it succeeds, it will fail the test because the exception was
not raised. If the code raises an exception of the correct type, the test will continue.

You can also get the content of the raised exception if you want to execute additional assertions against it.

class MyTestCase(unittest.TestCase):
def test_using_context_manager(self):
with self.assertRaises(ZeroDivisionError) as ex:
x = division_function(1, 0)

self.assertEqual(ex.message, 'integer division or modulo by zero')

By providing a callable function

def division_function(dividend, divisor):


"""
Dividing two numbers.

:type dividend: int


:type divisor: int

:raises: ZeroDivisionError if divisor is zero (0).


:rtype: int
"""
return dividend / divisor

class MyTestCase(unittest.TestCase):
def test_passing_function(self):
self.assertRaises(ZeroDivisionError, division_function, 1, 0)

The exception to check for must be the first parameter, and a callable function must be passed as the second
parameter. Any other parameters specified will be passed directly to the function that is being called, allowing you
to specify the parameters that trigger the exception.

Section 192.3: Testing Exceptions


Programs throw errors when for instance wrong input is given. Because of this, one needs to make sure that an
error is thrown when actual wrong input is given. Because of that we need to check for an exact exception, for this
example we will use the following exception:

class WrongInputException(Exception):
pass

This exception is raised when wrong input is given, in the following context where we always expect a number as
text input.

GoalKicker.com – Python® Notes for Professionals 773


def convert2number(random_input):
try:
my_input = int(random_input)
except ValueError:
raise WrongInputException("Expected an integer!")
return my_input

To check whether an exception has been raised, we use assertRaises to check for that exception. assertRaises
can be used in two ways:

1. Using the regular function call. The first argument takes the exception type, second a callable (usually a
function) and the rest of arguments are passed to this callable.
2. Using a with clause, giving only the exception type to the function. This has as advantage that more code can
be executed, but should be used with care since multiple functions can use the same exception which can be
problematic. An example: with self.assertRaises(WrongInputException): convert2number("not a number")

This first has been implemented in the following test case:

import unittest

class ExceptionTestCase(unittest.TestCase):

def test_wrong_input_string(self):
self.assertRaises(WrongInputException, convert2number, "not a number")

def test_correct_input(self):
try:
result = convert2number("56")
self.assertIsInstance(result, int)
except WrongInputException:
self.fail()

There also may be a need to check for an exception which should not have been thrown. However, a test will
automatically fail when an exception is thrown and thus may not be necessary at all. Just to show the options, the
second test method shows a case on how one can check for an exception not to be thrown. Basically, this is
catching the exception and then failing the test using the fail method.

Section 192.4: Choosing Assertions Within Unittests


While Python has an assert statement, the Python unit testing framework has better assertions specialized for
tests: they are more informative on failures, and do not depend on the execution's debug mode.

Perhaps the simplest assertion is assertTrue, which can be used like this:

import unittest

class SimplisticTest(unittest.TestCase):
def test_basic(self):
self.assertTrue(1 + 1 == 2)

This will run fine, but replacing the line above with

self.assertTrue(1 + 1 == 3)

will fail.

GoalKicker.com – Python® Notes for Professionals 774


The assertTrue assertion is quite likely the most general assertion, as anything tested can be cast as some boolean
condition, but often there are better alternatives. When testing for equality, as above, it is better to write

self.assertEqual(1 + 1, 3)

When the former fails, the message is

======================================================================

FAIL: test (__main__.TruthTest)

----------------------------------------------------------------------

Traceback (most recent call last):

File "stuff.py", line 6, in test

self.assertTrue(1 + 1 == 3)

AssertionError: False is not true

but when the latter fails, the message is

======================================================================

FAIL: test (__main__.TruthTest)

----------------------------------------------------------------------

Traceback (most recent call last):

File "stuff.py", line 6, in test

self.assertEqual(1 + 1, 3)
AssertionError: 2 != 3

which is more informative (it actually evaluated the result of the left hand side).

You can find the list of assertions in the standard documentation. In general, it is a good idea to choose the
assertion that is the most specifically fitting the condition. Thus, as shown above, for asserting that 1 + 1 == 2 it is
better to use assertEqual than assertTrue. Similarly, for asserting that a is None, it is better to use assertIsNone
than assertEqual.

Note also that the assertions have negative forms. Thus assertEqual has its negative counterpart assertNotEqual,
and assertIsNone has its negative counterpart assertIsNotNone. Once again, using the negative counterparts
when appropriate, will lead to clearer error messages.

Section 192.5: Unit tests with pytest


installing pytest:

pip install pytest

getting the tests ready:

GoalKicker.com – Python® Notes for Professionals 775


mkdir tests
touch tests/test_docker.py

Functions to test in docker_something/helpers.py:

from subprocess import Popen, PIPE


# this Popen is monkeypatched with the fixture `all_popens`

def copy_file_to_docker(src, dest):


try:
result = Popen(['docker','cp', src, 'something_cont:{}'.format(dest)], stdout=PIPE,
stderr=PIPE)
err = result.stderr.read()
if err:
raise Exception(err)
except Exception as e:
print(e)
return result

def docker_exec_something(something_file_string):
fl = Popen(["docker", "exec", "-i", "something_cont", "something"], stdin=PIPE, stdout=PIPE,
stderr=PIPE)
fl.stdin.write(something_file_string)
fl.stdin.close()
err = fl.stderr.read()
fl.stderr.close()
if err:
print(err)
exit()
result = fl.stdout.read()
print(result)

The test imports test_docker.py:

import os
from tempfile import NamedTemporaryFile
import pytest
from subprocess import Popen, PIPE

from docker_something import helpers


copy_file_to_docker = helpers.copy_file_to_docker
docker_exec_something = helpers.docker_exec_something

mocking a file like object in test_docker.py:

class MockBytes():
'''Used to collect bytes
'''
all_read = []
all_write = []
all_close = []

def read(self, *args, **kwargs):


# print('read', args, kwargs, dir(self))
self.all_read.append((self, args, kwargs))

def write(self, *args, **kwargs):


# print('wrote', args, kwargs)
self.all_write.append((self, args, kwargs))

GoalKicker.com – Python® Notes for Professionals 776


def close(self, *args, **kwargs):
# print('closed', self, args, kwargs)
self.all_close.append((self, args, kwargs))

def get_all_mock_bytes(self):
return self.all_read, self.all_write, self.all_close

Monkey patching with pytest in test_docker.py:

@pytest.fixture
def all_popens(monkeypatch):
'''This fixture overrides / mocks the builtin Popen
and replaces stdin, stdout, stderr with a MockBytes object

note: monkeypatch is magically imported


'''
all_popens = []

class MockPopen(object):
def __init__(self, args, stdout=None, stdin=None, stderr=None):
all_popens.append(self)
self.args = args
self.byte_collection = MockBytes()
self.stdin = self.byte_collection
self.stdout = self.byte_collection
self.stderr = self.byte_collection
pass
monkeypatch.setattr(helpers, 'Popen', MockPopen)

return all_popens

Example tests, must start with the prefix test_ in the test_docker.py file:

def test_docker_install():
p = Popen(['which', 'docker'], stdout=PIPE, stderr=PIPE)
result = p.stdout.read()
assert 'bin/docker' in result

def test_copy_file_to_docker(all_popens):
result = copy_file_to_docker('asdf', 'asdf')
collected_popen = all_popens.pop()
mock_read, mock_write, mock_close = collected_popen.byte_collection.get_all_mock_bytes()
assert mock_read
assert result.args == ['docker', 'cp', 'asdf', 'something_cont:asdf']

def test_docker_exec_something(all_popens):

docker_exec_something(something_file_string)

collected_popen = all_popens.pop()
mock_read, mock_write, mock_close = collected_popen.byte_collection.get_all_mock_bytes()
assert len(mock_read) == 3
something_template_stdin = mock_write[0][1][0]
these = [os.environ['USER'], os.environ['password_prod'], 'table_name_here', 'test_vdm',
'col_a', 'col_b', '/tmp/test.tsv']
assert all([x in something_template_stdin for x in these])

running the tests one at a time:

GoalKicker.com – Python® Notes for Professionals 777


py.test -k test_docker_install tests
py.test -k test_copy_file_to_docker tests
py.test -k test_docker_exec_something tests

running all the tests in the tests folder:

py.test -k test_ tests

Section 192.6: Mocking functions with


unittest.mock.create_autospec
One way to mock a function is to use the create_autospec function, which will mock out an object according to its
specs. With functions, we can use this to ensure that they are called appropriately.

With a function multiply in custom_math.py:

def multiply(a, b):


return a * b

And a function multiples_of in process_math.py:

from custom_math import multiply

def multiples_of(integer, *args, num_multiples=0, **kwargs):


"""
:rtype: list
"""
multiples = []

for x in range(1, num_multiples + 1):


"""
Passing in args and kwargs here will only raise TypeError if values were
passed to multiples_of function, otherwise they are ignored. This way we can
test that multiples_of is used correctly. This is here for an illustration
of how create_autospec works. Not recommended for production code.
"""
multiple = multiply(integer,x, *args, **kwargs)
multiples.append(multiple)

return multiples

We can test multiples_of alone by mocking out multiply. The below example uses the Python standard library
unittest, but this can be used with other testing frameworks as well, like pytest or nose:

from unittest.mock import create_autospec


import unittest

# we import the entire module so we can mock out multiply


import custom_math
custom_math.multiply = create_autospec(custom_math.multiply)
from process_math import multiples_of

class TestCustomMath(unittest.TestCase):
def test_multiples_of(self):
multiples = multiples_of(3, num_multiples=1)
custom_math.multiply.assert_called_with(3, 1)

GoalKicker.com – Python® Notes for Professionals 778


def test_multiples_of_with_bad_inputs(self):
with self.assertRaises(TypeError) as e:
multiples_of(1, "extra arg", num_multiples=1) # this should raise a TypeError

GoalKicker.com – Python® Notes for Professionals 779


Chapter 193: py.test
Section 193.1: Setting up py.test
py.test is one of several third party testing libraries that are available for Python. It can be installed using pip with

pip install pytest

The Code to Test

Say we are testing an addition function in projectroot/module/code.py:

# projectroot/module/code.py
def add(a, b):
return a + b

The Testing Code

We create a test file in projectroot/tests/test_code.py. The file must begin with test_ to be recognized as a
testing file.

# projectroot/tests/test_code.py
from module import code

def test_add():
assert code.add(1, 2) == 3

Running The Test

From projectroot we simply run py.test:

# ensure we have the modules


$ touch tests/__init__.py
$ touch module/__init__.py
$ py.test
================================================== test session starts
===================================================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /projectroot, inifile:
collected 1 items

tests/test_code.py .

================================================ 1 passed in 0.01 seconds


================================================

Section 193.2: Intro to Test Fixtures


More complicated tests sometimes need to have things set up before you run the code you want to test. It is
possible to do this in the test function itself, but then you end up with large test functions doing so much that it is
difficult to tell where the setup stops and the test begins. You can also get a lot of duplicate setup code between
your various test functions.

GoalKicker.com – Python® Notes for Professionals 780


Our code file:

# projectroot/module/stuff.py
class Stuff(object):
def prep(self):
self.foo = 1
self.bar = 2

Our test file:

# projectroot/tests/test_stuff.py
import pytest
from module import stuff

def test_foo_updates():
my_stuff = stuff.Stuff()
my_stuff.prep()
assert 1 == my_stuff.foo
my_stuff.foo = 30000
assert my_stuff.foo == 30000

def test_bar_updates():
my_stuff = stuff.Stuff()
my_stuff.prep()
assert 2 == my_stuff.bar
my_stuff.bar = 42
assert 42 == my_stuff.bar

These are pretty simple examples, but if our Stuff object needed a lot more setup, it would get unwieldy. We see
that there is some duplicated code between our test cases, so let's refactor that into a separate function first.

# projectroot/tests/test_stuff.py
import pytest
from module import stuff

def get_prepped_stuff():
my_stuff = stuff.Stuff()
my_stuff.prep()
return my_stuff

def test_foo_updates():
my_stuff = get_prepped_stuff()
assert 1 == my_stuff.foo
my_stuff.foo = 30000
assert my_stuff.foo == 30000

def test_bar_updates():
my_stuff = get_prepped_stuff()
assert 2 == my_stuff.bar
my_stuff.bar = 42
assert 42 == my_stuff.bar

This looks better but we still have the my_stuff = get_prepped_stuff() call cluttering up our test functions.

py.test fixtures to the rescue!

GoalKicker.com – Python® Notes for Professionals 781


Fixtures are much more powerful and flexible versions of test setup functions. They can do a lot more than we're
leveraging here, but we'll take it one step at a time.

First we change get_prepped_stuff to a fixture called prepped_stuff. You want to name your fixtures with nouns
rather than verbs because of how the fixtures will end up being used in the test functions themselves later. The
@pytest.fixture indicates that this specific function should be handled as a fixture rather than a regular function.

@pytest.fixture
def prepped_stuff():
my_stuff = stuff.Stuff()
my_stuff.prep()
return my_stuff

Now we should update the test functions so that they use the fixture. This is done by adding a parameter to their
definition that exactly matches the fixture name. When py.test executes, it will run the fixture before running the
test, then pass the return value of the fixture into the test function through that parameter. (Note that fixtures
don't need to return a value; they can do other setup things instead, like calling an external resource, arranging
things on the filesystem, putting values in a database, whatever the tests need for setup)

def test_foo_updates(prepped_stuff):
my_stuff = prepped_stuff
assert 1 == my_stuff.foo
my_stuff.foo = 30000
assert my_stuff.foo == 30000

def test_bar_updates(prepped_stuff):
my_stuff = prepped_stuff
assert 2 == my_stuff.bar
my_stuff.bar = 42
assert 42 == my_stuff.bar

Now you can see why we named it with a noun. but the my_stuff = prepped_stuff line is pretty much useless, so
let's just use prepped_stuff directly instead.

def test_foo_updates(prepped_stuff):
assert 1 == prepped_stuff.foo
prepped_stuff.foo = 30000
assert prepped_stuff.foo == 30000

def test_bar_updates(prepped_stuff):
assert 2 == prepped_stuff.bar
prepped_stuff.bar = 42
assert 42 == prepped_stuff.bar

Now we're using fixtures! We can go further by changing the scope of the fixture (so it only runs once per test
module or test suite execution session instead of once per test function), building fixtures that use other fixtures,
parametrizing the fixture (so that the fixture and all tests using that fixture are run multiple times, once for each
parameter given to the fixture), fixtures that read values from the module that calls them... as mentioned earlier,
fixtures have a lot more power and flexibility than a normal setup function.

Cleaning up after the tests are done.

Let's say our code has grown and our Stuff object now needs special clean up.

# projectroot/module/stuff.py

GoalKicker.com – Python® Notes for Professionals 782


class Stuff(object):
def prep(self):
self.foo = 1
self.bar = 2

def finish(self):
self.foo = 0
self.bar = 0

We could add some code to call the clean up at the bottom of every test function, but fixtures provide a better way
to do this. If you add a function to the fixture and register it as a finalizer, the code in the finalizer function will get
called after the test using the fixture is done. If the scope of the fixture is larger than a single function (like module
or session), the finalizer will be executed after all the tests in scope are completed, so after the module is done
running or at the end of the entire test running session.

@pytest.fixture
def prepped_stuff(request): # we need to pass in the request to use finalizers
my_stuff = stuff.Stuff()
my_stuff.prep()
def fin(): # finalizer function
# do all the cleanup here
my_stuff.finish()
request.addfinalizer(fin) # register fin() as a finalizer
# you can do more setup here if you really want to
return my_stuff

Using the finalizer function inside a function can be a bit hard to understand at first glance, especially when you
have more complicated fixtures. You can instead use a yield fixture to do the same thing with a more human
readable execution flow. The only real difference is that instead of using return we use a yield at the part of the
fixture where the setup is done and control should go to a test function, then add all the cleanup code after the
yield. We also decorate it as a yield_fixture so that py.test knows how to handle it.

@pytest.yield_fixture
def prepped_stuff(): # it doesn't need request now!
# do setup
my_stuff = stuff.Stuff()
my_stuff.prep()
# setup is done, pass control to the test functions
yield my_stuff
# do cleanup
my_stuff.finish()

And that concludes the Intro to Test Fixtures!

For more information, see the official py.test fixture documentation and the official yield fixture documentation

Section 193.3: Failing Tests


A failing test will provide helpful output as to what went wrong:

# projectroot/tests/test_code.py
from module import code

def test_add__failing():
assert code.add(10, 11) == 33

GoalKicker.com – Python® Notes for Professionals 783


Results:

$ py.test
================================================== test session starts
===================================================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /projectroot, inifile:
collected 1 items

tests/test_code.py F

======================================================== FAILURES
========================================================
___________________________________________________ test_add__failing
____________________________________________________

def test_add__failing():
> assert code.add(10, 11) == 33
E assert 21 == 33
E + where 21 = <function add at 0x105d4d6e0>(10, 11)
E + where <function add at 0x105d4d6e0> = code.add

tests/test_code.py:5: AssertionError
================================================ 1 failed in 0.01 seconds
================================================

GoalKicker.com – Python® Notes for Professionals 784


Chapter 194: Profiling
Section 194.1: %%timeit and %timeit in IPython
Profiling string concatenation:

In [1]: import string

In [2]: %%timeit s=""; long_list=list(string.ascii_letters)*50


....: for substring in long_list:
....: s+=substring
....:
1000 loops, best of 3: 570 us per loop

In [3]: %%timeit long_list=list(string.ascii_letters)*50


....: s="".join(long_list)
....:
100000 loops, best of 3: 16.1 us per loop

Profiling loops over iterables and lists:

In [4]: %timeit for i in range(100000):pass


100 loops, best of 3: 2.82 ms per loop

In [5]: %timeit for i in list(range(100000)):pass


100 loops, best of 3: 3.95 ms per loop

Section 194.2: Using cProfile (Preferred Profiler)


Python includes a profiler called cProfile. This is generally preferred over using timeit.

It breaks down your entire script and for each method in your script it tells you:

ncalls: The number of times a method was called


tottime: Total time spent in the given function (excluding time made in calls to sub-functions)
percall: Time spent per call. Or the quotient of tottime divided by ncalls
cumtime: The cumulative time spent in this and all subfunctions (from invocation till exit). This figure is
accurate even for recursive functions.
percall: is the quotient of cumtime divided by primitive calls
filename:lineno(function): provides the respective data of each function

The cProfiler can be easily called on Command Line using:

$ python -m cProfile main.py

To sort the returned list of profiled methods by the time taken in the method:

$ python -m cProfile -s time main.py

Section 194.3: timeit() function


Profiling repetition of elements in an array

>>> import timeit

GoalKicker.com – Python® Notes for Professionals 785


>>> timeit.timeit('list(itertools.repeat("a", 100))', 'import itertools', number = 10000000)
10.997665435877963
>>> timeit.timeit('["a"]*100', number = 10000000)
7.118789926862576

Section 194.4: timeit command line


Profiling concatenation of numbers

python -m timeit "'-'.join(str(n) for n in range(100))"


10000 loops, best of 3: 29.2 usec per loop

python -m timeit "'-'.join(map(str,range(100)))"


100000 loops, best of 3: 19.4 usec per loop

Section 194.5: line_profiler in command line


The source code with @profile directive before the function we want to profile:

import requests

@profile
def slow_func():
s = requests.session()
html=s.get("https://en.wikipedia.org/").text
sum([pow(ord(x),3.1) for x in list(html)])

for i in range(50):
slow_func()

Using kernprof command to calculate profiling line by line

$ kernprof -lv so6.py

Wrote profile results to so6.py.lprof


Timer unit: 4.27654e-07 s

Total time: 22.6427 s


File: so6.py
Function: slow_func at line 4

Line # Hits Time Per Hit % Time Line Contents


==============================================================
4 @profile
5 def slow_func():
6 50 20729 414.6 0.0 s = requests.session()
7 50 47618627 952372.5 89.9 html=s.get("https://en.wikipedia.org/").text
8 50 5306958 106139.2 10.0 sum([pow(ord(x),3.1) for x in list(html)])

Page request is almost always slower than any calculation based on the information on the page.

GoalKicker.com – Python® Notes for Professionals 786


VIDEO: Machine
Learning A-Z: Hands-On
Python In Data Science
Learn to create Machine Learning Algorithms in
Python from two Data Science experts. Code
templates included.

✔ Master Machine Learning on Python


✔ Have a great intuition of many Machine Learning models
✔ Make accurate predictions
✔ Make powerful analysis
✔ Make robust Machine Learning models
✔ Create strong added value to your business
✔ Use Machine Learning for personal purpose
✔ Handle specific topics like Reinforcement Learning, NLP and Deep Learning
✔ Handle advanced techniques like Dimensionality Reduction
✔ Know which Machine Learning model to choose for each type of problem
✔ Build an army of powerful Machine Learning models and know how to combine them to solve any
problem

Watch Today →
Chapter 195: Python speed of program
Section 195.1: Deque operations
A deque is a double-ended queue.

class Deque:
def __init__(self):
self.items = []

def isEmpty(self):
return self.items == []

def addFront(self, item):


self.items.append(item)

def addRear(self, item):


self.items.insert(0,item)

def removeFront(self):
return self.items.pop()

def removeRear(self):
return self.items.pop(0)

def size(self):
return len(self.items)

Operations : Average Case (assumes parameters are randomly generated)

Append : O(1)

Appendleft : O(1)

Copy : O(n)

Extend : O(k)

Extendleft : O(k)

Pop : O(1)

Popleft : O(1)

Remove : O(n)

Rotate : O(k)

Section 195.2: Algorithmic Notations


There are certain principles that apply to optimization in any computer language, and Python is no exception. Don't
optimize as you go: Write your program without regard to possible optimizations, concentrating instead on
making sure that the code is clean, correct, and understandable. If it's too big or too slow when you've finished,
then you can consider optimizing it.

Remember the 80/20 rule: In many fields you can get 80% of the result with 20% of the effort (also called the

GoalKicker.com – Python® Notes for Professionals 788


90/10 rule - it depends on who you talk to). Whenever you're about to optimize code, use profiling to find out where
that 80% of execution time is going, so you know where to concentrate your effort.

Always run "before" and "after" benchmarks: How else will you know that your optimizations actually made a
difference? If your optimized code turns out to be only slightly faster or smaller than the original version, undo your
changes and go back to the original, clear code.

Use the right algorithms and data structures: Don't use an O(n2) bubble sort algorithm to sort a thousand elements
when there's an O(n log n) quicksort available. Similarly, don't store a thousand items in an array that requires an
O(n) search when you could use an O(log n) binary tree, or an O(1) Python hash table.

For more visit the link below... Python Speed Up

The following 3 asymptotic notations are mostly used to represent time complexity of algorithms.

1. Θ Notation: The theta notation bounds a functions from above and below, so it defines exact asymptotic
behavior. A simple way to get Theta notation of an expression is to drop low order terms and ignore leading
constants. For example, consider the following expression. 3n3 + 6n2 + 6000 = Θ(n3) Dropping lower order
terms is always fine because there will always be a n0 after which Θ(n3) has higher values than Θn2)
irrespective of the constants involved. For a given function g(n), we denote Θ(g(n)) is following set of
functions. Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 <= c1g(n) <= f(n) <= c2g(n) for
all n >= n0} The above definition means, if f(n) is theta of g(n), then the value f(n) is always between c1g(n) and
c2g(n) for large values of n (n >= n0). The definition of theta also requires that f(n) must be non-negative for
values of n greater than n0.

2. Big O Notation: The Big O notation defines an upper bound of an algorithm, it bounds a function only from
above. For example, consider the case of Insertion Sort. It takes linear time in best case and quadratic time in
worst case. We can safely say that the time complexity of Insertion sort is O(n^2). Note that O(n^2) also
covers linear time. If we use Θ notation to represent time complexity of Insertion sort, we have to use two
statements for best and worst cases:

1. The worst case time complexity of Insertion Sort is Θ(n^2).


2. The best case time complexity of Insertion Sort is Θ(n).

The Big O notation is useful when we only have upper bound on time complexity of an algorithm. Many times we
easily find an upper bound by simply looking at the algorithm. O(g(n)) = { f(n): there exist positive constants c and n0
such that 0 <= f(n) <= cg(n) for all n >= n0}

0. Ω Notation: Just as Big O notation provides an asymptotic upper bound on a function, Ω notation provides
an asymptotic lower bound. Ω Notation< can be useful when we have lower bound on time complexity of an
algorithm. As discussed in the previous post, the best case performance of an algorithm is generally not
useful, the Omega notation is the least used notation among all three. For a given function g(n), we denote by
Ω(g(n)) the set of functions. Ω (g(n)) = {f(n): there exist positive constants c and n0 such that 0 <= cg(n) <= f(n)
for all n >= n0}. Let us consider the same Insertion sort example here. The time complexity of Insertion Sort
can be written as Ω(n), but it is not a very useful information about insertion sort, as we are generally
interested in worst case and sometimes in average case.

Section 195.3: Notation


Basic Idea

The notation used when describing the speed of your Python program is called Big-O notation. Let's say you have a
function:

GoalKicker.com – Python® Notes for Professionals 789


def list_check(to_check, the_list):
for item in the_list:
if to_check == item:
return True
return False

This is a simple function to check if an item is in a list. To describe the complexity of this function, you will say O(n).
This means "Order of n" as the O function is known as the Order function.

O(n) - generally n is the number of items in container

O(k) - generally k is the value of the parameter or the number of elements in the parameter

Section 195.4: List operations


Operations : Average Case (assumes parameters are randomly generated)

Append : O(1)

Copy : O(n)

Del slice : O(n)

Delete item : O(n)

Insert : O(n)

Get item : O(1)

Set item : O(1)

Iteration : O(n)

Get slice : O(k)

Set slice : O(n + k)

Extend : O(k)

Sort : O(n log n)

Multiply : O(nk)

x in s : O(n)

min(s), max(s) :O(n)

Get length : O(1)

Section 195.5: Set operations


Operation : Average Case (assumes parameters generated randomly) : Worst case

x in s : O(1)

Difference s - t : O(len(s))

GoalKicker.com – Python® Notes for Professionals 790


Intersection s&t : O(min(len(s), len(t))) : O(len(s) * len(t)

Multiple intersection s1&s2&s3&...&sn : : (n-1) * O(l) where l is max(len(s1),...,len(sn))

s.difference_update(t) : O(len(t)) : O(len(t) * len(s))

s.symetric_difference_update(t) : O(len(t))

Symetric difference s^t : O(len(s)) : O(len(s) * len(t))

Union s|t : O(len(s) + len(t))

GoalKicker.com – Python® Notes for Professionals 791


Chapter 196: Performance optimization
Section 196.1: Code profiling
First and foremost you should be able to find the bottleneck of your script and note that no optimization can
compensate for a poor choice in data structure or a flaw in your algorithm design. Secondly do not try to optimize
too early in your coding process at the expense of readability/design/quality. Donald Knuth made the following
statement on optimization:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root
of all evil. Yet we should not pass up our opportunities in that critical 3%"

To profile your code you have several tools: cProfile (or the slower profile) from the standard library,
line_profiler and timeit. Each of them serve a different purpose.

cProfile is a deterministic profiler: function call, function return, and exception events are monitored, and precise
timings are made for the intervals between these events (up to 0.001s). The library documentation
([https://docs.python.org/2/library/profile.html][1]) provides us with a simple use case

import cProfile
def f(x):
return "42!"
cProfile.run('f(12)')

Or if you prefer to wrap parts of your existing code:

import cProfile, pstats, StringIO


pr = cProfile.Profile()
pr.enable()
# ... do something ...
# ... long ...
pr.disable()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print s.getvalue()

This will create outputs looking like the table below, where you can quickly see where your program spends most of
its time and identify the functions to optimize.

3 function calls in 0.000 seconds

Ordered by: standard name


ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :1(f)
1 0.000 0.000 0.000 0.000 :1()
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

The module line_profiler ([https://github.com/rkern/line_profiler][1]) is useful to have a line by line analysis of


your code. This is obviously not manageable for long scripts but is aimed at snippets. See the documentation for
more details. The easiest way to get started is to use the kernprof script as explained one the package page, note
that you will need to specify manually the function(s) to profile.

GoalKicker.com – Python® Notes for Professionals 792


$ kernprof -l script_to_profile.py

kernprof will create an instance of LineProfiler and insert it into the __builtins__ namespace with the name
profile. It has been written to be used as a decorator, so in your script, you decorate the functions you want to
profile with @profile.

@profile
def slow_function(a, b, c):
...

The default behavior of kernprof is to put the results into a binary file script_to_profile.py.lprof . You can tell
kernprof to immediately view the formatted results at the terminal with the [-v/--view] option. Otherwise, you can
view the results later like so:

$ python -m line_profiler script_to_profile.py.lprof

Finally timeit provides a simple way to test one liners or small expression both from the command line and the
python shell. This module will answer question such as, is it faster to do a list comprehension or use the built-in
list() when transforming a set into a list. Look for the setup keyword or -s option to add setup code.

>>> import timeit


>>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
0.8187260627746582

from a terminal

$ python -m timeit '"-".join(str(n) for n in range(100))'


10000 loops, best of 3: 40.3 usec per loop

GoalKicker.com – Python® Notes for Professionals 793


Chapter 197: Security and Cryptography
Python, being one of the most popular languages in computer and network security, has great potential in security
and cryptography. This topic deals with the cryptographic features and implementations in Python from its uses in
computer and network security to hashing and encryption/decryption algorithms.

Section 197.1: Secure Password Hashing


The PBKDF2 algorithm exposed by hashlib module can be used to perform secure password hashing. While this
algorithm cannot prevent brute-force attacks in order to recover the original password from the stored hash, it
makes such attacks very expensive.

import hashlib
import os

salt = os.urandom(16)
hash = hashlib.pbkdf2_hmac('sha256', b'password', salt, 100000)

PBKDF2 can work with any digest algorithm, the above example uses SHA256 which is usually recommended. The
random salt should be stored along with the hashed password, you will need it again in order to compare an
entered password to the stored hash. It is essential that each password is hashed with a different salt. As to the
number of rounds, it is recommended to set it as high as possible for your application.

If you want the result in hexadecimal, you can use the binascii module:

import binascii
hexhash = binascii.hexlify(hash)

Note: While PBKDF2 isn't bad, bcrypt and especially scrypt are considered stronger against brute-force attacks.
Neither is part of the Python standard library at the moment.

Section 197.2: Calculating a Message Digest


The hashlib module allows creating message digest generators via the new method. These generators will turn an
arbitrary string into a fixed-length digest:

import hashlib

h = hashlib.new('sha256')
h.update(b'Nobody expects the Spanish Inquisition.')
h.digest()
# ==> b'.\xdf\xda\xdaVR[\x12\x90\xff\x16\xfb\x17D\xcf\xb4\x82\xdd)\x14\xff\xbc\xb6Iy\x0c\x0eX\x9eF-='

Note that you can call update an arbitrary number of times before calling digest which is useful to hash a large file
chunk by chunk. You can also get the digest in hexadecimal format by using hexdigest:

h.hexdigest()
# ==> '2edfdada56525b1290ff16fb1744cfb482dd2914ffbcb649790c0e589e462d3d'

Section 197.3: Available Hashing Algorithms


hashlib.new requires the name of an algorithm when you call it to produce a generator. To find out what
algorithms are available in the current Python interpreter, use hashlib.algorithms_available:

GoalKicker.com – Python® Notes for Professionals 794


import hashlib
hashlib.algorithms_available
# ==> {'sha256', 'DSA-SHA', 'SHA512', 'SHA224', 'dsaWithSHA', 'SHA', 'RIPEMD160', 'ecdsa-with-SHA1',
'sha1', 'SHA384', 'md5', 'SHA1', 'MD5', 'MD4', 'SHA256', 'sha384', 'md4', 'ripemd160', 'sha224',
'sha512', 'DSA', 'dsaEncryption', 'sha', 'whirlpool'}

The returned list will vary according to platform and interpreter; make sure you check your algorithm is available.

There are also some algorithms that are guaranteed to be available on all platforms and interpreters, which are
available using hashlib.algorithms_guaranteed:

hashlib.algorithms_guaranteed
# ==> {'sha256', 'sha384', 'sha1', 'sha224', 'md5', 'sha512'}

Section 197.4: File Hashing


A hash is a function that converts a variable length sequence of bytes to a fixed length sequence. Hashing files can
be advantageous for many reasons. Hashes can be used to check if two files are identical or verify that the contents
of a file haven't been corrupted or changed.

You can use hashlib to generate a hash for a file:

import hashlib

hasher = hashlib.new('sha256')
with open('myfile', 'r') as f:
contents = f.read()
hasher.update(contents)

print hasher.hexdigest()

For larger files, a buffer of fixed length can be used:

import hashlib
SIZE = 65536
hasher = hashlib.new('sha256')
with open('myfile', 'r') as f:
buffer = f.read(SIZE)
while len(buffer) > 0:
hasher.update(buffer)
buffer = f.read(SIZE)
print(hasher.hexdigest())

Section 197.5: Generating RSA signatures using pycrypto


RSA can be used to create a message signature. A valid signature can only be generated with access to the private
RSA key, validating on the other hand is possible with merely the corresponding public key. So as long as the other
side knows your public key they can verify the message to be signed by you and unchanged - an approach used for
email for example. Currently, a third-party module like pycrypto is required for this functionality.

import errno

from Crypto.Hash import SHA256


from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5

GoalKicker.com – Python® Notes for Professionals 795


message = b'This message is from me, I promise.'

try:
with open('privkey.pem', 'r') as f:
key = RSA.importKey(f.read())
except IOError as e:
if e.errno != errno.ENOENT:
raise
# No private key, generate a new one. This can take a few seconds.
key = RSA.generate(4096)
with open('privkey.pem', 'wb') as f:
f.write(key.exportKey('PEM'))
with open('pubkey.pem', 'wb') as f:
f.write(key.publickey().exportKey('PEM'))

hasher = SHA256.new(message)
signer = PKCS1_v1_5.new(key)
signature = signer.sign(hasher)

Verifying the signature works similarly but uses the public key rather than the private key:

with open('pubkey.pem', 'rb') as f:


key = RSA.importKey(f.read())
hasher = SHA256.new(message)
verifier = PKCS1_v1_5.new(key)
if verifier.verify(hasher, signature):
print('Nice, the signature is valid!')
else:
print('No, the message was signed with the wrong private key or modified')

Note: The above examples use PKCS#1 v1.5 signing algorithm which is very common. pycrypto also implements the
newer PKCS#1 PSS algorithm, replacing PKCS1_v1_5 by PKCS1_PSS in the examples should work if you want to use
that one. Currently there seems to be little reason to use it however.

Section 197.6: Asymmetric RSA encryption using pycrypto


Asymmetric encryption has the advantage that a message can be encrypted without exchanging a secret key with
the recipient of the message. The sender merely needs to know the recipients public key, this allows encrypting the
message in such a way that only the designated recipient (who has the corresponding private key) can decrypt it.
Currently, a third-party module like pycrypto is required for this functionality.

from Crypto.Cipher import PKCS1_OAEP


from Crypto.PublicKey import RSA

message = b'This is a very secret message.'

with open('pubkey.pem', 'rb') as f:


key = RSA.importKey(f.read())
cipher = PKCS1_OAEP.new(key)
encrypted = cipher.encrypt(message)

The recipient can decrypt the message then if they have the right private key:

with open('privkey.pem', 'rb') as f:


key = RSA.importKey(f.read())
cipher = PKCS1_OAEP.new(key)
decrypted = cipher.decrypt(encrypted)

GoalKicker.com – Python® Notes for Professionals 796


Note: The above examples use PKCS#1 OAEP encryption scheme. pycrypto also implements PKCS#1 v1.5 encryption
scheme, this one is not recommended for new protocols however due to known caveats.

Section 197.7: Symmetric encryption using pycrypto


Python's built-in crypto functionality is currently limited to hashing. Encryption requires a third-party module like
pycrypto. For example, it provides the AES algorithm which is considered state of the art for symmetric encryption.
The following code will encrypt a given message using a passphrase:

import hashlib
import math
import os

from Crypto.Cipher import AES

IV_SIZE = 16 # 128 bit, fixed for the AES algorithm


KEY_SIZE = 32 # 256 bit meaning AES-256, can also be 128 or 192 bits
SALT_SIZE = 16 # This size is arbitrary

cleartext = b'Lorem ipsum'


password = b'highly secure encryption password'
salt = os.urandom(SALT_SIZE)
derived = hashlib.pbkdf2_hmac('sha256', password, salt, 100000,
dklen=IV_SIZE + KEY_SIZE)
iv = derived[0:IV_SIZE]
key = derived[IV_SIZE:]

encrypted = salt + AES.new(key, AES.MODE_CFB, iv).encrypt(cleartext)

The AES algorithm takes three parameters: encryption key, initialization vector (IV) and the actual message to be
encrypted. If you have a randomly generated AES key then you can use that one directly and merely generate a
random initialization vector. A passphrase doesn't have the right size however, nor would it be recommendable to
use it directly given that it isn't truly random and thus has comparably little entropy. Instead, we use the built-in
implementation of the PBKDF2 algorithm to generate a 128 bit initialization vector and 256 bit encryption key from
the password.

Note the random salt which is important to have a different initialization vector and key for each message
encrypted. This ensures in particular that two equal messages won't result in identical encrypted text, but it also
prevents attackers from reusing work spent guessing one passphrase on messages encrypted with another
passphrase. This salt has to be stored along with the encrypted message in order to derive the same initialization
vector and key for decrypting.

The following code will decrypt our message again:

salt = encrypted[0:SALT_SIZE]
derived = hashlib.pbkdf2_hmac('sha256', password, salt, 100000,
dklen=IV_SIZE + KEY_SIZE)
iv = derived[0:IV_SIZE]
key = derived[IV_SIZE:]
cleartext = AES.new(key, AES.MODE_CFB, iv).decrypt(encrypted[SALT_SIZE:])

GoalKicker.com – Python® Notes for Professionals 797


Chapter 198: Secure Shell Connection in
Python
Parameter Usage
hostname This parameter tells the host to which the connection needs to be established
username username required to access the host
port host port
password password for the account

Section 198.1: ssh connection


from paramiko import client
ssh = client.SSHClient() # create a new SSHClient object
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) #auto-accept unknown host keys
ssh.connect(hostname, username=username, port=port, password=password) #connect with a host
stdin, stdout, stderr = ssh.exec_command(command) # submit a command to ssh
print stdout.channel.recv_exit_status() #tells the status 1 - job failed

GoalKicker.com – Python® Notes for Professionals 798


Chapter 199: Python Anti-Patterns
Section 199.1: Overzealous except clause
Exceptions are powerful, but a single overzealous except clause can take it all away in a single line.

try: res = get_result() res = res[0] log('got result: %r' % res) except: if not res: res = '' print('got exception')

This example demonstrates 3 symptoms of the antipattern:

1. The except with no exception type (line 5) will catch even healthy exceptions, including KeyboardInterrupt.
That will prevent the program from exiting in some cases.
2. The except block does not reraise the error, meaning that we won't be able to tell if the exception came from
within get_result or because res was an empty list.
3. Worst of all, if we were worried about result being empty, we've caused something much worse. If
get_result fails, res will stay completely unset, and the reference to res in the except block, will raise
NameError, completely masking the original error.

Always think about the type of exception you're trying to handle. Give the exceptions page a read and get a feel for
what basic exceptions exist.

Here is a fixed version of the example above:

import traceback try: res = get_result() except Exception: log_exception(traceback.format_exc()) raise try: res = res[0]
except IndexError: res = '' log('got result: %r' % res)

We catch more specific exceptions, reraising where necessary. A few more lines, but infinitely more correct.

Section 199.2: Looking before you leap with processor-


intensive function
A program can easily waste time by calling a processor-intensive function multiple times.

For example, take a function which looks like this: it returns an integer if the input value can produce one, else
None:

def intensive_f(value): # int -> Optional[int]


# complex, and time-consuming code
if process_has_failed:
return None
return integer_output

And it could be used in the following way:

x = 5
if intensive_f(x) is not None:
print(intensive_f(x) / 2)
else:
print(x, "could not be processed")

print(x)

Whilst this will work, it has the problem of calling intensive_f, which doubles the length of time for the code to
run. A better solution would be to get the return value of the function beforehand.

GoalKicker.com – Python® Notes for Professionals 799


x = 5
result = intensive_f(x)
if result is not None:
print(result / 2)
else:
print(x, "could not be processed")

However, a clearer and possibly more pythonic way is to use exceptions, for example:

x = 5
try:
print(intensive_f(x) / 2)
except TypeError: # The exception raised if None + 1 is attempted
print(x, "could not be processed")

Here no temporary variable is needed. It may often be preferable to use a assert statement, and to catch the
AssertionError instead.

Dictionary keys

A common example of where this may be found is accessing dictionary keys. For example compare:

bird_speeds = get_very_long_dictionary()

if "european swallow" in bird_speeds:


speed = bird_speeds["european swallow"]
else:
speed = input("What is the air-speed velocity of an unladen swallow?")

print(speed)

with:

bird_speeds = get_very_long_dictionary()

try:
speed = bird_speeds["european swallow"]
except KeyError:
speed = input("What is the air-speed velocity of an unladen swallow?")

print(speed)

The first example has to look through the dictionary twice, and as this is a long dictionary, it may take a long time to
do so each time. The second only requires one search through the dictionary, and thus saves a lot of processor
time.

An alternative to this is to use dict.get(key, default), however many circumstances may require more complex
operations to be done in the case that the key is not present

GoalKicker.com – Python® Notes for Professionals 800


VIDEO: Machine
Learning, Data Science
and Deep Learning with
Python
Complete hands-on machine learning tutorial with
data science, Tensorflow, artificial intelligence,
and neural networks

✔ Build artificial neural networks with Tensorflow and Keras


✔ Classify images, data, and sentiments using deep learning
✔ Make predictions using linear regression, polynomial regression, and multivariate regression
✔ Data Visualization with MatPlotLib and Seaborn
✔ Implement machine learning at massive scale with Apache Spark's MLLib
✔ Understand reinforcement learning - and how to build a Pac-Man bot
✔ Classify data using K-Means clustering, Support Vector Machines (SVM), KNN, Decision Trees,
Naive Bayes, and PCA
✔ Use train/test and K-Fold cross validation to choose and tune your models
✔ Build a movie recommender system using item-based and user-based collaborative filtering

Watch Today →

You might also like