KEMBAR78
03 Python Notes | PDF | Software Engineering | Software
0% found this document useful (0 votes)
18 views16 pages

03 Python Notes

The document consists of lecture notes for a Machine Learning course (STAT 479) taught by Sebastian Raschka at the University of Wisconsin–Madison in Fall 2018. It provides an overview of the Python programming language, including its installation, setup, and relevant programming concepts necessary for scientific computing and machine learning. The notes emphasize the importance of prior programming knowledge and the use of Python's scientific libraries for implementing machine learning algorithms.

Uploaded by

Ahmed Haitham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views16 pages

03 Python Notes

The document consists of lecture notes for a Machine Learning course (STAT 479) taught by Sebastian Raschka at the University of Wisconsin–Madison in Fall 2018. It provides an overview of the Python programming language, including its installation, setup, and relevant programming concepts necessary for scientific computing and machine learning. The notes emphasize the importance of prior programming knowledge and the use of Python's scientific libraries for implementing machine learning algorithms.

Uploaded by

Ahmed Haitham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

STAT 479: Machine Learning

Lecture Notes

Sebastian Raschka
Department of Statistics
University of Wisconsin–Madison

http://stat.wisc.edu/∼sraschka/teaching/stat479-fs2018/

Fall 2018

Contents
3 Using Python 3
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1.1 About this Lecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2.1 Installing Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2.2 Managing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Installing and Updating packages . . . . . . . . . . . . . . . . . . . . . 8
3.3 Running Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Interpreter/REPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.2 IPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.3 Scripts (.py files) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.4 Jupyter Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.5 JupyterLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.6 Jupyter Notebooks and Homework Submissions . . . . . . . . . . . . . 13
3.4 Relevant Python Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Basic Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.3 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.1 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.2 Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.3 Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.4 Tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.6 Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7.1 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7.2 Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.8 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.9 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.10 Command Line Arguments via Scripts . . . . . . . . . . . . . . . . . . . . . . 15
3.11 Reading and Writing files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.12 Importing Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.13 Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.14 GIL and Multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.15 Subprocesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.16 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.17 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.18 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2
STAT 479: Machine Learning
Lecture Notes

Sebastian Raschka
Department of Statistics
University of Wisconsin–Madison

http://stat.wisc.edu/∼sraschka/teaching/stat479-fs2018/

Fall 2018

3 Using Python

3.1 Introduction

3.1.1 About this Lecture

This lecture provides a brief overview of the Python programming language. While Python is
among the easiest programming languages to learn – one of the reasons why it has become so
popular across different fields – it is impossible to cover all essential aspects of the language
adequately within an hour. Hence, it is expected that you already have basic programming
knowledge and experience, as listed in the course pre-requisites. Also, as mentioned at the
beginning of this course and listed on the course website, students who have not used Python
before will need to spend a few hours on becoming more familiar with it, by working through
one of the recommended resources listed on the course website1 or other books or courses.
Throughout this lecture, it is important to keep in mind that we are not trying to learn
Python from computer science, programming, or software engineering perspectives but as
researchers with scientific computing needs in mind. People often say that programming
is a computer science topic, which I disagree with. Learning a programming language is
similar to learning how to use a pen. Moreover, learning how to write words with a pen does
not mean that we are automatically novelists. Since many computer science topics are very
relevant to machine learning (for example, complexity theory and big-O notation, which we
briefly covered the last lecture), it is highly recommended to read an introductory computer
science textbook if you would like to pursue machine learning further after this course.
That being said, this lecture and the following lectures will provide a quick overview of the
relevant topics that you will need for several homework exercises and as a foundation for
using the scientific computing libraries for machine learning. Note that this lecture mainly
covers the Python language itself, whereas the next lecture will focus more on scientific
computing libraries for Python, that we will be using to implement and use machine learning
algorithms.
1 http://stat.wisc.edu/∼sraschka/teaching/stat479-fs2018/
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 4

3.1.2 Python

In short, Python2 is an interpreted, dynamic language that does not require static type
declarations. In that sense, while being a multi-purpose programming language, it is more
similar to the R language rather than the C programming language3 , which has a static
type system.
For example, consider a simple program that prints the string “a+b=c” and the result of
the integer division “1+2.” If we were to implement that in its simplest way In C, we first
have to write a file with the following contents (note the static type declarations):

#include <stdio.h>

int main ()
{
int result;
char word[6] = "a+b=c";
result = 1 + 2;
printf("%s\n", word);
printf("%d\n", result);
return 0;
}

Then, we would need to compile it into a program and run it. For example, if we saved the
file as example.c, we could compile it via the GNU compiler tools, GCC 4 :

$ gcc -o example example.c

Next, we would execute the compiled program, example from the command line\footnote{Lines
starting with an $ symbol indicate that we execute a command in a command line terminal,
not in a Python interpreter‘}:

$ ./example

Executing the example program then produces the following output:

a+b=c
3

The equivalent program written in Python syntax might look as follows:

result = 1 + 2
word = "a+b=c"
print(result)
print(word)
2 The first version of Python was released on 1990 by its creator Guido van Rossum. The origin of the

name ”Python,” despite its current logo, is unrelated to the snake but is derived from Monty Python comedy
group and their show, Monty Python’s Flying Circus, which was popular in the 1970s and 1980s.
3 While there are many different Python interpreters out there, the official Python interpreter is itself

written in C
4 https://gcc.gnu.org
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 5

We can execute the code a) in an interactive Python interpreter or b) in a text file (for
example, example.py) that we can then execute using a Python interpreter (for example, $
python example.py). Some more details about executing Python code will follow later.
Here, the main point is that we do not have to specify the types of the variables in Python,
and we do not need to compile anything in order to obtain the results. Alternatively, we
can write the code in a .py script file, which we can execute using a Python interpreter
(e.g., python script.py). But again, no separate compilation step is required. Python,
as a dynamic, interpreted language, is very flexible and convenient to use, which makes it
especially attractive for scientific computing.
One downside of dynamically typed languages is that they generally perform computations
magnitudes slower than statically typed languages. However, in the next lecture, we will
work with libraries that implement the more “expensive” computations in C or Fortran
code and use Python as a so-called “glue” or “wrapper” language. This way, by calling
functions in Python that have been implemented lower-level programming languages, we
leverage Python’s convenient syntax while using computationally efficient code.
Another downside of dynamic typic is that errors, except for syntax errors, are only raised
during runtime. In certain scenarios, this can have important implications. For example,
consider the following Python code snippet:

if cond:
text = "abc" + 123
else:
text = "abc" + "!!!"

The line nested under the if statement gets executed if a certain condition cond is true;
otherwise, the line that follows the else statement gets executed instead. Now, there is an
illegal expression in this code example, "abc" + 123: we cannot add an integer-type value
to a string-type value. However, if the condition (cond) is always false, the code runs just
fine because the line will never be executed – no harm done. The worst case scenario is
though if cond is rarely true because this error occurs very rarely – it may slip through our
test suites if we are not very thorough. In contrast, equivalent errors in static-type languages
such as C are usually caught by the compiler so that we can fix them before we run, share,
“ship,” or deploy our code.

3.2 Setup

This section provides an overview of the different ways Python can be installed and set
up. Please do not follow these instructions on your first read-through. Read through the
complete “Setup” section first as it lists several alternative approaches. Once you read the
section, you can revisit it and set up Python the way you prefer.

3.2.1 Installing Python

Note that many different operating systems already come with a default Python installation.
While the default Python version on macOS is relatively outdated (some old version of
Python 2.7), most Linux distributions come with a version of Python 3.
In general, you can check the pre-installed Python version by executing

$ which python
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 6

in a Linux or macOS/Unix command line terminal (the Windows equivalent is where


python).
Regardless of which version comes already installed with your operating system, I strongly
advise not to tinker with it and install a newer version of Python separately on your com-
puter. The reason is that the “built-in” Python version is used by certain processes and
services of the operating system, and updating or modifying it is not only cumbersome but
is also likely cause issues for your operating system.

Python 2.7 vs. Python 3 For those who are curious why there is a debate: about
10 years ago, Python developers wanted to add substantial improvements to the Python
language. However, these improvements would have been backward incompatible. Hence,
people would have been forced to rewrite certain sections of their “old” code in order use the
latest Python versions, which can be cumbersome if you have developed large code bases.
Hence, the Python community decided to branch off and develop Python 3 while maintaining
Python 2.7 with minimal updates (bugfixes, security updates, etc.).
Instead of making the switch to Python 3, many people kept using Python 2.7. However,
10 years after the introduction of Python 3, more people use Python 3 than Python 2.7
(according to most surveys you can find on the internet), and almost all major libraries
have been ported to Python 35 . In fact, many major libraries are going to drop (or already
dropped) Python 2.7 support 6 . Furthermore, Python 2.7 will lose official support by 2020,
which means that no updates (no bug fixes, no security updates, etc.) will be made7 after
2019.
In sum, there is no point in learning Python 2.7 now, and we will be using Python 3 in
this class8 . In particular, Python 3.7 was just recently released in summer 2018, which
I recommend you to install. However, since we are not going to use any 3.7-specific new
features, using Python 3.6 should also be fine.

Windows, Linux, and macOS Unfortunately, I am not very familiar with Windows as
an operating system; hence, this tutorial is more geared towards Linux and macOS9 . In
general, the Python interpreter should work on Windows in the same way as it does on
Linux and macOS/Unix. However, there will be certain differences when working with data
files on your computer’s storage disk. For instance, Linux and macOS/Unix uses forward
slashes (“/home/sebastian/my data file.txt”) Windows uses backslashes
(“C:\sebastian\my data file.txt”).
In general, if you have a computer running Windows that you want to use for scientific
computing and want to make your life easier, I recommend installing Linux alongside Win-
dows on your main operating system as a dual-boot environment and use Linux for scientific
computing tasks. Linux is much more popular than Windows in the scientific computing
community; thus, you will find much more help and tutorials online that are targeted to
Linux and macOS/Unix environment than for Windows. Also, many libraries and tools
require Linux and macOS/Unix operating systems – I am not aware that this is true for any
of the libraries we will be using in this class, though.
There’s also a program called Cygwin10 , which is a bundle of Linux programs for Windows.
Recent Microsoft Windows versions now also support having a Linux subsystem within
5 http://py3readiness.org
6 http://python3statement.org
7 https://pythonclock.org
8 In case you are interested, here is an article that covers some of the most relevant differences between

Python 2.7 and Python 3: https://sebastianraschka.com/Articles/2014 python 2 3 key diff.html


9 macOS has a Unix core, which inspired the development of Linux as a ”free” Unix clone. Hence, these

two operating systems are very similar.


10 https://www.cygwin.com
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 7

Windows11 .
Again, if you use Windows as your main computer, you will probably be fine. However,
using Linux or macOS/Unix are highly recommend for scientific computing. If you do not
want to install Linux right now (which is understandable, because it takes quite some time,
and you have many other things to do) I think that Cygwin will make your life easier.

python.org The “official” way to install Python is to obtain an official distribution from
the python.org website. You can find a Python installer for several different operating
systems and versions of Python (I recommend you to use one of the most recent ones, either
3.6.6 or 3.7.0) at https://www.python.org/downloads/. If you choose to install Python from
there, please follow the instruction provided on the https://www.python.org website.

Anaconda/Miniconda Based on my experience, the most popular and most convenient


way to use Python for scientific computing is to use the Anaconda or Miniconda (both
are free). You can think of Anaconda as an alternative distribution of Python that comes
with a package manager (called conda), which makes installing scientific packages easier by
handling complex dependencies12 .
Anaconda comes with a whole bunch of Python packages pre-installed. It contains most of
the packages that we will be using in this class and many additional ones that we do not
need. In comparison, Miniconda is a “leaner” distribution that does not come with many
packages pre-installed. Since it is smaller and leaner, and it is easy to install packages via
the conda manager, Miniconda is my favorite choice.
To use the Anaconda or Miniconda distribution (highly recommended!) instead of the
Python distribution, please download the respective installer from the anaconda.com website
and follow the instructions there.

3.2.2 Managing Environments

Another important aspect of using Python is managing environments and packages. This is
especially useful if we are working on different projects which each require different Python
packages and different versions thereof. Essentially, a virtual environment is like a “con-
tainer” on your computer that contains only those libraries that are relevant for a given
project, and you can only use them if the environment is “active.” If you have multiple
projects, virtual environments are a powerful organizational tool.

virtualenv The probably most widely used tool for creating and managing Python en-
vironments is virtualenv. You can find more about virtualenv on the website: https:
//virtualenv.pypa.io/en/stable/.

conda If you are using Anaconda or Miniconda, it is recommended to use the conda
package managing tool that comes with it. With conda, we can create and manage virtual
environments similar to virtualenv. For example, below is some code for creating a virtual
environment that we call “stat479” just for this class:

$ conda create -n stat479 python=3.7


11 https://docs.microsoft.com/en-us/windows/wsl/install-win10
12 Here, dependencies mean that certain packages depend on other packages. And often, only certain
versions of these packages are compatible with each other.
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 8

Then, to activate this environment, we execute

$ source activate stat479

Note that we have to execute source activate stat479 each time we open a new command
line terminal. Otherwise, the default environment will be used.
You can see that a virtual environment is active based on your command line prompt as
illustrated below.
Before:

sebastian@Sebastians-MacBook-Pro:~$ source activate stat479

After:

(stat479) sebastian@Sebastians-MacBook-Pro:~$

For more information about conda and managing virtual environments, please see https:
//conda.io/docs/user-guide/tasks/manage-environments.html.

3.2.3 Installing and Updating packages

There are two recommended ways for installing Python packages, which will be introduced
in the next subsections.

Pip Pip is the official Python package installer. While pip is a Python library or package
itself, you can directly use it from the command line as a standalone program. For example,
to install the NumPy package that we will be using next lecture, you can use the following
command:

$ pip install numpy

A specific version of a package can be specified in the installation command as follows:

$ pip install numpy=1.15

To upgrade a package that is already installed, use the --upgrade flag:

$ pip install --upgrade numpy

To update pip itself, use

$ pip install --upgrade pip

Uninstalling a package is also simple:

$ pip uninstall numpy


Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 9

Note that since pip is also a Python package, entering pip on the command line terminal
is the same as running python -m pip, that is, running pip as a module via the -m flag.
This may be useful of your command line terminal does not recognize the pip command.
For example, NumPy can be installed this way as follows:

$ python -m pip install numpy

For more information about Pip, please see https://pip.pypa.io/en/stable/ and refer to
https://pip.pypa.io/en/stable/installing/ in case you encounter problems with using pip on
your command line.

Conda If you are using Anaconda or Miniconda, it is highly recommended to use the
conda package manager to install Python packages instead of pip. The Anaconda team
provides pre-compiled versions of Python packages to ensure the best compatibility with your
environment, and it handles complex dependencies between different packages if necessary.
The usage is very similar to pip. To install the NumPy package that we will be using the
next lecture, you can use the following command:

$ conda install numpy

A specific version of a package can be specified in the installation command as follows:

$ conda install numpy=1.15

To upgrade a package that is already installed, use the update command:

$ conda update numpy

To update conda itself, use

$ conda update conda

Uninstalling a package is also simple:

$ conda uninstall numpy

While most major packages for scientific computing are available via conda, you may find
that you need packages that are not available through the conda installer. Note that if
you are typing conda install , the package is fetched from the official Anaconda website.
However, conda also allows us to specify channels to download packages from other sources.
One of these is the community project conda-forge, which provides additional packages
for conda that are not available via the official Anaconda channel. One such example is the
mlxtend package, which we will be using for one of the homework exercises to visualize 2D
decision regions of scikit-learn classifiers. To install mlxtend from the conda-forge channel,
you can use the -c (channel) flag as follows:

$ conda install mlxtend -c conda-forge

Even if you are primarily using conda, you can still install packages via pip. Below, I listed
my recommended order of approaches to try when installing a new package:
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 10

1. directly via conda;


2. via conda from the conda-forge channel;
3. using pip.

3.3 Running Python Code

There are many different ways how we can execute Python code. The following subsections
list some of these. Note that you are welcome to use any approach you prefer, the problem
sets (homeworks) will be handed out as Jupyter Notebooks. Also, you are expected to hand
in your homework in the form of Jupyter Notebooks + an HTML file created from the
Jupyter Notebook. This will be discussed in class.

3.3.1 Interpreter/REPL

The simplest way to use Python is via the so-called “Read-eval-print loop” (REPL). The
REPL essentially means that we are executing Python code in an interactive session:

$ python
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print(1 + 2)
3
>>> for i in range(5):
... print(i)
...
0
1
2
3
4
>>>

Note that in this document, we use the following notational convention:

• Starting a line with a “$” character refers to a command line prompt in a terminal
(for example, a Linux or Unix shell).
• An “>>>” at the line start refers to a prompt in a Python interpreter.
• An “...” at the beginning of a line indicates the continuation of the input command
that was initiated by the previous >>> prompt.

The REPL is useful if we want to evaluate a small number of expressions, for example, but it
is not recommended for doing “heavy lifting,” that is, running more extensive code examples
or programs.

3.3.2 IPython

IPython stands for “interactive” Python, and using Python over the standard Python REPL
has many advantages – for example, the so-called “magics,” which are some extra commands
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 11

for our convenience13 . Another of my favorite IPython features is that we can use the Tab
key to autocomplete function and variable names.
IPython can be installed using conda. For example,

conda install ipython

For more information and installation instruction, please refer to the official documentation
at https://ipython.org/install.html.
Once installed, we can start an IPython session by evoking the ipython command from the
command line:

$ ipython
Python 3.7.0 (default, Jun 28 2018, 07:39:16)
Type ’copyright’, ’credits’ or ’license’ for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type ’?’ for help.

While all default Python language and interpreter features also work in IPython, IPython
has a nicer “help” documentation compared to the help() function in Python, which we
can evoke via the ? command. For example, if we want to find out more about Python’s
sorted() function, we can simply type sorted?, as shown below:

In [1]: sorted?
Signature: sorted(iterable, /, *, key=None, reverse=False)
Docstring:
Return a new list containing all items from the iterable in ascending order.

A custom key function can be supplied to customize the sort order and the
the reverse flag can be set to request the result in descending order.
Type: builtin_function_or_method

Magic commands are preceded by an % symbol in IPython. One of my favorites is the


%timeit magic command, which is very useful for “benchmarking” functions:

In [11]: def reverse_string_1(my_str):


...: return ’’.join(reversed(my_str))

In [12]: def reverse_string_2(my_str):


...: return my_str[::-1]

In [13]: %timeit reverse_string_1(very_long_string)


1.74 ms $\pm$ 20.5 $\mu$s per loop (mean $\ std. dev. of 7 runs, 1000 loops each)

In [14]: %timeit reverse_string_2(very_long_string)


42.4 $\mu$s $\pm$ 608 ns per loop (mean $\pm$ std. dev. of 7 runs, 10000 loops each)

Also, IPython allows us to use Linux/Unix commands directly within the Python session
– if we start a command with “!” IPython interprets the command after the ! as a shell
(Linux/Unix) command. For example, !ls lists the current subdirectories and files in the
current working directory:
13 https://ipython.org/ipython-doc/3/interactive/magics.html
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 12

In [15]: !ls
Creative Cloud Files OneDrive code
Desktop Pictures custom-settings
Documents Public miniconda3
Downloads ...

3.3.3 Scripts (.py files)

If we are developing more extensive analyses or programs, we want to keep most of our in
some sort of file, which has certain advantages. For example, if we execute a long series of
commands in the REPL or in IPython to produce certain results, it would be cumbersome
to reproduce the results if we want to perform the same or a similar analysis (for example,
if we add a few extra entries to the dataset). If we kept the code in a file instead, it is easier
to

• modify and adjust code;


• backup our code;
• share our code.

If you are creating Python (.py) files, you can, of course, use any text editor you like.
However, it is highly recommended to use a text editor that at least offers programming-
language specific syntax highlighting to make your life easier. Common and good choices
for text editors are Atom and Sublime Text, for example.
Of course, there are also specific IDEs (Integrated Developer Environments) for Python that
provide additional convenience functions. Commonly used Python IDEs are PyCharm and
Spyder. However, for many tasks in scientific computing, IDEs are considered “overkill” and
would be something to consider for more advanced Python-based software development.

3.3.4 Jupyter Notebooks

Jupyter Notebooks are basically like Microsoft Word documents for writing code and con-
ducting data analyses (incl. notes, figures, and plots).
Originally, Jupyter Notebook14 was developed as an interactive document on top of IPython
– back then, it was called IPython Notebook15 . However, over the years, the developers
extended the “Notebook” concept to also support other programming languages, such as
Julia and R16 .
You can think of Jupyter Notebooks as an interactive environment similar to IPython.
However, in addition to having an interactive IPython session, Jupyter Notebooks are also
“documents” that allow us to add text, LaTeX equations, figures, and so forth.
The reason why Jupyter notebooks are so popular within the scientific computing community
is that they make it easy to save, present, and share a data analysis in a single, executable
file.
We discussed how to set up and use notebooks live in class. Since Jupyter notebooks are
such an interactive concept, it is probably most effective if you consider a video tutorial as
a reference rather than text. For example, a good tutorial for setting up Jupyter Notebook
is available on YouTube at https://www.youtube.com/watch?v=HW29067qVWk.
14 https://jupyter-notebook.readthedocs.io/en/stable/
15 This is the reason why we still use the file ending ‘.ipynb‘ for Jupyter notebooks.
16 The term Jupyter is basically something like an acronym of the terms Julia, Python, and R
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 13

3.3.5 JupyterLab

While Jupyter Notebook is the “original” application for working with Jupyter notebooks,
it is not the only one. Analogously, Microsoft Word is a program to open Word (.doc, .docx)
files, but those files can also be opened by other applications (like OpenOffice or LibreOffice).
Recently, a new, officially supported interface for Jupyter notebooks was released called
JupyterLab17 . JupyterLab is a modernized version of Jupyter Notebook that adds some
more convenience features on top of it. You are welcome to use it since .ipynb files are
compatible with both Jupyter Notebook and JupyterLab.

3.3.6 Jupyter Notebooks and Homework Submissions

As discussed in the lecture, when it comes to computing, we will be mostly working with
Jupyter Notebooks in this course. You will receive the homework assignments, questions,
and starter code as Jupyter notebooks (.ipynb files). Also, you are then expected to hand
us back the solutions in the form of Jupyter notebooks, which are the original notebook files
but modified with your solutions.
We (the TA and I) will be viewing your answers to the homework questions assignments
using Jupyter Notebook, too. Also, we will run your code on our computers to make sure
that the code you provide in certain assignments actually works. Thus, please make sure
that your notebooks can be executed sequentially.
For working on the homework and submitting it, please follow the following steps:

• Do not modify the cells that contain the original question/assignment text or code
cells that start with the line ”# DO NOT MODIFY THIS CELL.
• After you finished working on your notebook, make a copy of the notebook, and click
on the “Restart & Run All Cells” button under the “Cells” tab in the menu bar, to
check that all your code can be executed in sequential order.

• If everything works as expected, export the notebook as HTML file (click on File ->
Download As -> HTML (.html) ).
• Send both the .ipynb and the .html file of the notebook with your homework solutions
to Shan Lu (the TA) and CC me on the email.

3.4 Relevant Python Topics

There are many really good Python resources out there as we discussed. Also, since this is a
machine learning course, we cannot spend to much time on learning Python in this course.
Below is a list of Python concepts that I would consider as most relevant.
If you are already familiar with Python, my recommendation is to read through the list below
and check for yourself that the subjects make any sense to you. If the majority of these do,
you can read up on individual concepts using the Python documentation, for example.
If most of this is new to you, you should consider spending a few hours working through a
Python learning resource – consider these listed on the course website.
Also, consider the excellent, official, and free Python tutorial as a reference resource, as well
as the official Python documentation for learning about Python:
17 https://jupyterlab.readthedocs.io/en/latest/
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 14

• Official Python Tutorial: https://docs.python.org/3/tutorial/index.html.

• Official Python Documentation: https://docs.python.org/3/

3.4.1 Basic Types

• bool, float, int, str, . . .

3.4.2 Operators

Arithmetic Operators

• +, -, *, /, //, **, . . .

3.4.3 Strings

Basics

• single quote, double quote, escape characters, strings that span multiple lines

• strings are immutable objects


• string indexing and slicing
• different ways to print a string

Basic string methods

• .upper(), .lower(), .replace(), .startswith(), . . .

3.5 Data Structures

3.5.1 List

• list type

• mutable
• sorting a list
• variable-size

• slicing and indexing

3.5.2 Dictionary

• dict type
• key-value pairs

• keys must be immutable types


• fast look-up (“hash table”)
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 15

3.5.3 Set

• unique values (stores immutable objects)


• fast look-up (“hash table”)

3.5.4 Tuple

• like list but fixed size


• comma creates tuple, not the parenthesis

3.6 Conditionals

• if / elif / else

3.7 Iteration

• while-loop

• for-loop
• useful keywords: continue, break
• useful objects: range, enum, zip

3.7.1 Generators

• difference between looping over a generator vs. a list

3.7.2 Comprehensions

• list comprehension

• set comprehension
• dictionary comprehension

3.8 Functions

3.9 Classes

3.10 Command Line Arguments via Scripts

• Using sys

import sys

first_value = sys.argv[1]
second_value = sys.argv[2]
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 16

print("A:", first_value)
print("B:", second_value)

• more sophisticated command line argument parsing via argparse library

3.11 Reading and Writing files

• f = open('file.txt', 'r') + f.close()


• better: with open('file.txt', 'r')
• r for read mode, w for write mode

3.12 Importing Libraries

• import numpy

• import numpy as np
• from numpy import some function

3.13 Standard Library

3.14 GIL and Multiprocessing

• GIL = Global Interpreter Lock

3.15 Subprocesses

3.16 Exceptions

3.17 Debugging

3.18 Resources

• Really good, official, and free Python tutorial: https://docs.python.org/3/tutorial/


index.html
• PEP8, Python Style Guide: https://www.python.org/dev/peps/pep-0008/
• Additional Python resources you should consider working through if you are new to
Python: http://pages.stat.wisc.edu/∼sraschka/teaching/stat479-fs2018/#resources

You might also like