Python for DataScience 2
Popular tools used in data science
Data pre-processing and analysis
◦ Python, R, Microsoft Excel, SAS, SPSS
Data exploration and visualization
◦ Tableau, Qlikview, Microsoft Excel
Parallel and distributed computing incase of big data
◦ Apache Spark,Apache Hadoop
3.
Python for DataScience 3
Evolution of Python
Python was developed by Guido van Rossum in the late
eighties at the ‘National Research Institute for Mathematics
and Computer Science’ at Netherlands
Python Editions
◦ Python 1.0
◦ Python 2.0
◦ Python 3.0
4.
Python for DataScience 4
Python as a programming language
Supports multiple programming paradigm
◦ Functional, Structural, OOPs, etc.
Dynamic typing
◦ Runtime type safety checks
Reference counts
◦ Deallocates objects which are not used for long
Late binding
◦ Methods are looked up by name during runtime
Python’s design is guided by 20 aphorisms as described in Zen of
Python by Tim Peters
5.
Python for DataScience 5
Python as a programming language
Standard CPython interpreter is managed by “Python Software
Foundation”
There are other interpreters namely JPython (Java), Iron Python
(C#), Stackless Python (C, used for parallelism), PyPy (Python
itself JIT compilation)
Standard libraries are written in python itself
High standards of readability
6.
Python for DataScience 6
Python as a programming language
Cross-platform (Windows, Linux, Mac)
Highly supported by a large community group
Better error handle
7.
Python for DataScience 7
Python as a programming language
Comparison to Java
Python vs Java
◦ Java is statically typed i.e. type safety is checked during compilation
(static compilation)
◦ Thus in Java the time required to develop the code is more
◦ Python which is dynamically typed compensates for huge
compilation time when compared to Java
◦ Codes which are dynamically typed tend to be less verbose
therefore offering more readability
8.
Python for DataScience 8
Advantages of using python
Python has several features that make it well suited for data
science
Open source and community development
◦ Developed under Open Source Initiative license making it free to use
and distribute even commercially
Syntax used is simple to understand and code
Libraries designed for specific data science tasks
Combines well with majority of the cloud platform service
providers
9.
Python for DataScience 9
Coding environment
A software program can be written using a terminal, a
command prompt (cmd), a text editor or through an Integrated
Development Environment (IDE)
The program needs to be saved in a file with an appropriate
extension (.py for python, .mat for matlab, etc...) and can be
executed in corresponding environment (Python, Matlab, etc…)
Integrated Development Environment (IDE) is a software
product solely developed to support software development in
various or specific programming language(s)
10.
Python for DataScience 10
Coding environment
Python 2.x support will be available till 2020
Python 3.x is an enhanced version of 2.x and will only be maintained
from 3.6.x post 2020
Install basic python version or use the online python console as in
https://www.python.org/
Execute following commands and view the outputs in terminal or
command prompt
• Basic print statement
• Naming conventions for variables and functions, operators
• Conditional operations, looping statements (nested)
• Function declaration and calling
• Installing modules
Python for DataScience 13
Integrated development environment (IDE)
Software application consisting of a cohesive unit of tools
required for development
Designed to simplify software development
Utilities provided by IDEs include tools for managing, compiling,
deploying and debugging software
14.
Python for DataScience 14
Coding environment- IDE
An IDE usually comprises of
◦ Source code editor
◦ Compiler
◦ Debugger
◦ Additional features include syntax and error highlighting,
code completion
Offers supports in building and executing the program along
with debugging the code from within the environment
15.
Python for DataScience 15
Coding environment- IDE
Best IDEs provide version control features
Eclipse+PyDev, SublimeText,Atom, GNU Emacs,Vi/Vim,Visual
Studio,Visual Studio Code are general IDEs with python
support
Apart from these some of the python specific editors include
Pycharm, Jupyter, Spyder,Thonny
16.
Python for DataScience 16
Spyder
Supported across Linux, Mac OS X and Windows platforms
Available as open source version
Can be installed separately or through Anaconda distribution
Developed for Python and specifically data science
Features include
◦ Code editor with robust syntax and error highlighting
◦ Code completion and navigation
◦ Debugger
◦ Integrated document
Interface similar to MATLAB and RStudio
Python for DataScience 18
PyCharm
Supported across Linux, Mac OS X andWindows platforms
Available as community (free open source) and professional (paid) version
Supports only Python
Can be installed separately or through Anaconda distribution
Features include
◦ Code editor provides syntax and error highlighting
◦ Code completion and navigation
◦ Unit testing
◦ Debugger
◦ Version control
Python for DataScience 20
Jupyter Notebook
Web application that allows creation and manipulation of
documents called ‘notebook’
Supported across Linux, Mac OS X and Windows platforms
Available as open source version
21.
Python for DataScience 21
Jupyter Notebook
Source-https://jupyter.org/
22.
Python for DataScience 22
Jupyter Notebook
Bundled with Anaconda
distribution or can be installed
separately
Supports Julia, Python, R and
Scala
Consists of ordered collection of
input and output cells that contain
code, text, plots etc.
Source-https://jupyter.org/
23.
Python for DataScience 23
Jupyter Notebook
Allows sharing of code and
narrative text through output
formats like PDF, HTML etc.
◦ Education and presentation
tool
Lacks most of the features of
a good IDE
Source-https://jupyter.org/
24.
Python for DataScience 24
How to choose the best IDE?
Requirements
Working with different IDEs helps us understand our own
requirement