Python Basics
Kailash Singh
Professor, Department of Chemical Engineering
MNIT Jaipur
Why Python?
• Python is an interpreted, high-level, general-purpose programming
language.
• One of the key design of Python is code readability.
• Ease of use and high productivity.
• Comprehensive set of core libraries for data analysis and visualization.
• It has libraries for linear algebra, statistical analysis, machine learning,
visualization, optimization, stochastic models, etc.
• Python provides an interactive interface for data analysis.
Integrated Development Environment (IDE)
https://www.python.org/downloads/
Main Libraries in Python
Library Purpose
Numpy Efficient storage of arrays and matrices. Backbone of all scientific calculations and
algorithms.
pandas High-performance, easy-to-use data structures for data manipulation and analysis.
Pandas provide the features of DataFrame, which is very useful in the area of data
analytics.
SciPy Library for scientific computing. linear algebra, statistical computations, optimization
algorithm.
matplotlib Plotting and visualization
StatsModel Library for scientific computing. linear algebra, statistical computations, optimization
algorithm.
Scikit-learn Machine learning library. Collection of ML algorithms: Supervised and Unsupervised.
seaborn data visualization library based on matplotlib
jupyter notebook
In cmd type: jupyter notebook. Open a new notebook file.
idle
• Type idle in search
Google Colab
• Colab is a hosted Jupyter Notebook service that requires no setup to
use and provides free access to computing resources.
• Go to the link:
https://colab.research.google.com/
• Open a new notebook
• Type the following code:
a=5
b=6
c=a+b
print(c)
Variable Declaration
• Python supports the following variable types:
1. int – Integer type.
2. float – Floating point numbers.
3. bool – Booleans are subtypes of integers and assigned value using literals
True and False. (True is equivalent to 1 and False is equivalent to 0 in numerical
contexts)
4. str – Textual data.
• Python automatically infers the variable type from values assigned to
it.
Conditional Statements
• Python supports if-elif-else for writing conditional statements.
• Indentation is must.
• Examples:
# Checking a condition…
x=0
if x> 1:
print("Bigger than 1")
elif x<1:
print("Less than 1")
else:
print("equal to 1")
Control flow statement
#Create sequence of numbers x=0,1,2,3,4,5 using for loop
for x in range(6):
print(x)
#Create sequence using while loop
i=1
while i<6:
print(i)
i=i+1
Functions
• Functions can be created using def keyword.
• The function signature should contain the function name followed by the input
parameters in bracket and must end with a colon.
• The function ends with a return statement
• Example:
def addfun(a, b):
c=a+b
return c
a,b=2,3
c=addfun(a,b)
print(c)
Working with Collections
• List
• Tuple
• Set
• Dictionary
List
• Lists are like arrays, but can contain heterogeneous items, that is, a
single list can contain items of type integer, float, string, or objects.
• Example:
x=[1,2,7,10]
y=[1,2.5,6,8.9]
z=['Orange',2.3,5]
print(z[0:2]) #prints ['Orange',2.3]
a=[] #Empty List
print(z.index(2.3)) #prints 1
Tuple
• Tuple is also a list, but it is immutable. Once a tuple has been created
it cannot be modified.
• Example:
x=('Orange',2.3,5)
print(x[1])
Set
• A set is a collection of unique elements, that is, the values cannot
repeat.
• Example:
x={3,2,7,2}
print(x) #It will print {2,3,7}
Dictionary
• Dictionary is a list of key and value pairs. All the keys in a dictionary
are unique.
x={'Ram': 45, 'Shyam': 36, 'Mohan': 58}
x['Ram'] #prints 45
d={'Ram':{'English':45,'Maths':30},'Shyam':{'English':60,'Maths':70}}
d['Ram']['English'] #prints 45
map
• Example: Create a list of squares of the following list:
List1= [1,2,3,4,5,6]
Solution:
List1=[1,2,3,4,5,6] Alternatively:
List2=[] def fun(x):
for x in List1: return x*x
List2.append(x*x) List1=[1,2,3,4,5,6]
print(List2) m=map(fun,List1)
List2=list(m)
print(List2)
lambda
• Example: Create a list of squares of the following list:
List1= [1,2,3,4,5,6]
f=lambda x: x*x
List1=[1,2,3,4,5,6]
m=map(f,List1)
List2=list(m)
print(List2)
Filter
• Example: Filter even integer values from List1=[1,2,3,4,5,6]
f=lambda x: x%2==0
List1=[1,2,3,4,5,6]
y=filter(f, List1)
List2=list(y)
print(List2)
Pandas
Dataframes in Python
• The primary objective of descriptive analytics is comprehension of data
using data summarization, basic statistical measures and visualization.
• Data visualization is an integral component of business intelligence (BI).
• Data scientists deal with structured data; one of the most used is
structured query language (SQL) table.
• SQL tables represent data in rows and columns and make it convenient to
explore and apply transformations.
• The similar structure of presenting data is supported in Python through
DataFrames.
• DataFrames are inherited into Python by Pandas library.
Pandas library
• Pandas library support methods to explore, analyze, and prepare data.
• It can be used for performing activities such as load, filter, sort, group,
join datasets and also for dealing with missing data.
• Example: Create a dataframe for the following data:
Name Age City import pandas as pd
Mahesh 25 Delhi a= ['Mahesh', 'Suresh', 'Paresh']
Suresh 30 Mumbai b=[25,30,40]
Paresh 40 Kolkata c=['Delhi', 'Mumbai', 'Kolkata']
data = {'Name': a, 'Age': b, 'City': c}
df = pd.DataFrame(data)
print(df)
Contd…
• Locate row
print(df.loc[0]) # it prints first row of data
print(df.loc[[0,1]] # prints first and second row
print(df.head(2)) #prints first two rows of data
df.columns #prints column names
• Import csv
df = pd.read_csv('data.csv')
Dropping null values in DataFrame
• Sometime the data has null values (known as “None” in python). How
to remove such records?
• Example: #Dropping null values
d = {'col1': [1, 2,None], 'col2': [4, 5, 6]}
df=pd.DataFrame(d)
df2=df.dropna() # It does not change the original dataframe
Print(df2)
df.dropna(inplace=True) # It changes the original dataframe
Print(df)
#Replacing null values by some value
d = {'col1': [1, 2,None], 'col2': [4, 5, 6]}
df=pd.DataFrame(d)
df.fillna(15, inplace=True) # It replaces null values by 15
Print(df)
Some other operations on DataFrame
• Mean of a column: df['col1‘].mean()
• Mean of each column: df.mean()
• df.size returns the value nrow x ncol
• df.shape returns tuple (nrow , ncol)
• For more details refer: https://pandas.pydata.org/
NumPy
NumPy
• NumPy is a Python library used for working with arrays.
• NumPy stands for Numerical Python.
• NumPy aims to provide an array object that is up to 50x faster than
traditional Python lists.
• Arrays are very frequently used in data science, where speed and
resources are very important.
• First you have to install numpy library using the following command:
pip install numpy
• Use import numpy in Python.
Contd…
• Example: 1-D array
import numpy as np
x = np.array([1, 2, 3, 4, 5])
print(x)
Print(x[0])
type(x) # numpy.ndarray
• Example: 2-D array
import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6]]) # nested array
print(x)
print( x[0][0] )
type(x) # numpy.ndarray
Contd…
• Copy array
#Change original array #do not change original array
import numpy as np import numpy as np
x = np.array([1, 2, 3, 4, 5]) x = np.array([1, 2, 3, 4, 5])
y=x y=x.copy()
y[1]=7 #it changes x also y[1]=7 #it does not change x
print(x) #It prints [1,7,3,4,5] print(x) #It prints [1,2,3,4,5]
Contd.
• Shape
x = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
x.shape
x = np.array([1, 2, 3, 4])
x.shape
Sorting
• Example: sort x but do not change original array
import numpy as np
x = np.array([3,2,0,1])
y=np.sort(x) Question:
print(x) What is the difference between
print(y)
x.sort() and np.sort(x)?
• Example: sort x inplace
import numpy as np
x = np.array([3,2,0,1])
x.sort()
print(x)
NumPy Random
Generate a random float from 0 to 1: Generate a random integer from 0 (inclusive) to
100(exclusive):
from numpy import random
x=random.rand() from numpy import random
print(x) x = random.randint(100)
print(x)
• Based on choice: x = random.choice([3, 5, 7, 9])
• Float between two values: x=random.uniform(2,3) #includes 2 but not 3
Visualizing Data
matplotlib
• Install the library: python -m pip install matplotlib
• Example: India’s GDP
import matplotlib.pyplot as plt
Year = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]
GDP= [1676, 1823, 1828, 1857, 2039, 2104, 2295, 2651, 2702, 2835, 2671, 3150, 3385, 3737, 3940]#Billion Dollars
# create a line chart, Year on x-axis, GDP on y-axis
plt.plot(Year, GDP, color='green', marker='o', linestyle='solid')
plt.title("Nominal GDP")
plt.xlabel("Year")
plt.ylabel("Billions of $")
plt.show()
Contd…
• plt.scatter(Year, GDP) plt.bar(Year,GDP)
Contd…
• Histogram
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()
Seaborn
• Seaborn is a data visualization library built on top of matplotlib.
• It provides a high level interface for drawing attractive and
informative statistical graphics.
• It integrates well with Pandas DataFrame.
• Built-in Seaborn Datasets: tips, iris, penguins, titanic, flights, etc.
• Example:
import seaborn as sns
df=sns.load_dataset("tips")
print(df.head())
Basic Plot Types
• Relational plots: scatterplot, lineplot
• Categorical plots: boxplot, violinplot, stripplot
• Distribution plots: histplot, kdeplot, distplot
• Matrix plots: heatmap
• Regression plots: regplot, lmplot
Scatterplot
import seaborn as sns
import matplotlib.pyplot as plt
df=sns.load_dataset("tips")
sns.scatterplot(x="total_bill",y="tip", data=df)
plt.show()
You can add hue, style, and size:
sns.scatterplot(x="total_bill",y="tip", data=df, hue='smoker',
style='time', size='day')
Lineplot
import seaborn as sns
import matplotlib.pyplot as plt
df=sns.load_dataset("tips")
sns.lineplot(data=df, x="size", y="tip")
plt.show()
Relational plot
• A relational plot (relplot) is a versatile function for creating scatter
and line plots, with additional capabilities for multiple subplots.
import seaborn as sns
import matplotlib.pyplot as plt
df=sns.load_dataset("tips")
sns.relplot(data=df, x="total_bill",
y="tip", hue="smoker",col="time")
plt.show()
Boxplot
import seaborn as sns
import matplotlib.pyplot as plt
df=sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", data=df)
plt.show()
Heatmap
import seaborn as sns
import matplotlib.pyplot as plt
# Example data: 2D list (matrix)
data = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Create the heatmap
sns.heatmap(data, annot=True, cmap="Blues")
# Show the plot
plt.title("Simple Heatmap Example")
plt.show()
Histogram plot
import seaborn as sns
import matplotlib.pyplot as plt
df=sns.load_dataset("tips")
sns.histplot(data=df, x="total_bill")
plt.show()
Pairplot
• Pairplot is a powerful function in the Seaborn library for visualizing
pairwise relationships in a dataset.
• It creates a grid of plots, where each subplot represents the relationship
between two different variables in the dataset.
• It generates scatter plots for every combination of two variables in the
dataset.
• The diagonal plots in the grid display the univariate distribution of each
variable. By default, these are histograms, but they can be changed to
Kernel Density Estimate (KDE) plots or other types.
• kind controls the type of plot for the off-diagonal subplots (e.g., 'scatter',
'kde', 'hist', 'reg').
• diag_kind controls the type of plot for the diagonal subplots (e.g., 'hist',
'kde').
import seaborn as sns
import matplotlib.pyplot as plt
# Load a sample dataset
iris = sns.load_dataset("iris")
# Create a basic pairplot
sns.pairplot(iris)
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
# Load a sample dataset
iris = sns.load_dataset("iris")
# Create a pairplot with hue and different diagonal kind
sns.pairplot(iris, hue="species", diag_kind="kde")
plt.show()
Summary
• IDEs: idle, jupyter, pycharm, Google Colab, etc.
• Python Basics, functions, etc.
• Collections: List, Tuple, Set, Dictionary
• map, filter, lambda function
• Pandas
• Numpy
• matplotlib
• seaborn
Questions
• How do you inspect first few rows in Pandas dataframe?
• In a pandas dataframe df what is the output of df.loc[1]?
• Filter odd integers from a list x= [4,2,1,3,8,7].
• Create a dictionary with Fruits as Oranges and Apples and Price as Rs.
50, 200, respectively.
• Create a numpy matrix of size 2 by 3.
• Plot Predicted Temperature of next week using matplotlib.
• Plot scatter data in seaborn: x=[1,2,3,4], y=[1,4,9,16].
Problems on Programming
• Python Basics
1. Write a program to check whether a given number is even or odd.
2. Create a list of squares of numbers from 1 to 10 using a for loop.
3. Write a program to count the number of vowels in a string entered by the user.
4. Write a function that prints "Hello, <name>!”
5. Create a program that accepts student names and marks for 3 subjects. Calculate the average and
assign grades based on the average.
6. Write a function to return all prime numbers between two user-specified numbers using a
generator.
7. Use map and a lambda function to return f(x)=2x2+1 of all elements in the list: x = [1, 2, 3, 4, 5].
8. Filter all positive numbers from the list: nums = [-3, 5, -1, 0, 8, -7]
9. Write a lambda function to multiply two numbers. Use it to multiply 4 and 5.
10. Create a dictionary from the lists: keys = ['name', 'age', 'city']values = ['Alice', 25, 'Delhi']
Contd…
• Pandas
1. Create a Pandas DataFrame with columns Name, Age, and Disease using a dictionary.
2. Read a CSV file called students.csv and display the first 5 rows.
3. From a DataFrame df, filter the rows where Age is greater than 20.
4. Display the mean and maximum of the Age column from the DataFrame.
5. Load a CSV file with some missing values. Fill missing numeric values with the mean.
• Numpy
1. Perform element-wise addition of two NumPy arrays: [1, 2, 3] and [4, 5, 6].
2. Find the mean and standard deviation of a NumPy array [5, 10, 15, 20].
3. Create a 1D array of 6 elements and reshape it to a 2x3 matrix.
4. Create two random 3x3 matrices and perform addition, subtraction, matrix multiplication,
and element-wise multiplication.
5. Solve the following system of equations using NumPy: 3x+y=9, x+2y=8
Contd…
• Matplotlib
1. Plot a line graph for the following data: x = [1, 2, 3, 4], y = [10, 20, 25, 30]
2. Create a bar chart showing names on the x-axis and their scores on the y-axis.
3. Plot a graph and add xlabel, ylabel, and title to it.
4. Plot two lines on the same graph with different colors and labels.
• Seaborn
1. Load the built-in tips dataset using Seaborn and display the first 5 rows.
2. Create a box plot for total_bill grouped by gender using the tips dataset.
3. Plot a histogram of the total_bill column.
4. Create a scatter plot of total_bill vs. tip
5. Load any numeric dataset (e.g., Titanic, Iris) and plot a heatmap of the correlation
matrix with annotations.
6. Generate a pairplot from the Iris dataset and identify visible patterns between
features.