KEMBAR78
Dsa Lab Manual | PDF | Python (Programming Language) | P Value
0% found this document useful (0 votes)
7 views59 pages

Dsa Lab Manual

Uploaded by

drak24night
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views59 pages

Dsa Lab Manual

Uploaded by

drak24night
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

TABLE OF CONTENTS

S.NO DATE LIST OF EXPERIMENTS CO PAGE MARKS SIGN


NO
1 DOWNLOAD, INSTALL AND EXPLORE THE CO1
FEATURES OF NUMPY, SCIPY, JUPYTER,
STATS MODELS AND PANDAS PACKAGES.
2 WORKING WITH NUMPY ARRAYS CO1

3 WORKING WITH DATA FRAMES USING CO2


PANDAS

4 READING DATA FROM TEXT FILES, EXCEL CO2


AND THE WEB AND EXPLORING VARIOUS
COMMANDS FOR DOING DESCRIPTIVE
ANALYTICS ON THE IRIS DATA SET

5 BASIC PLOTS USING MATPLOTLIB CO2

FREQUENCY DISTRIBUTIONS, AVERAGES,


6 VARIABILITY CO2

7 NORMAL CURVES, CORRELATION AND CO2


SCATTER PLOTS, CORRELATION
COEFFICIENT
8 REGRESSION CO2

9 Z-TEST CO3
10 T-TEST CO4
11 ANOVA CO4

12 BUILDING AND VALIDATING LINEAR CO5


MODELS

13 BUILDING AND VALIDATING LOGISTIC CO5


MODELS

14 TIME SERIES ANALYSIS CO5

0
EX.NO.1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF NUMPY,
SCIPY, JUPYTER, STATS MODELS AND PANDAS PACKAGES.

Aim:

To download, install and explore the features of NumPy package, Scipy, jupyter, stats models and pandas
package

Problem Description
Python is an open-source object-oriented language. It has many features of which one is the wide
range of external packages. There are a lot of packages for installation and use for expanding
functionalities. These packages are a repository of functions in python script. NumPy is one such
package to ease array computations. To install all these python packages we use the pip- package
installer. Pip is automatically installed along with Python. We can then use pip in the command
line to install packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python programming
language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning tools for
working with arrays.
Prerequisites
● Access to a terminal window/command line
● A user account with do privileges
● Python installed on your system
Downloading and installing Numpy:
Python NumPy is a general-purpose array processing package that provides tools
for handling n-dimensional arrays. It provides various computing tools such as
comprehensive mathematical functions, linear algebra routines. Use the below command
to install NumPy:

pip install numpy

1
OUTPUT:

Sample python program using numpy:


import numpy as np
# Creating array object arr =
np.array( [[ 1, 2, 3],[ 4, 2, 5]]
)
# Printing type of arr object
print("Array is of
type:”type(arr))# Printing
array dimensions (axes)
print("No. of dimensions: ", arr.ndim
#Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

2
OUTPUT:

Data Science:
Data science combines math and statistics, specialized programming, advanced analytics, artificial
intelligence (AI), and machine learning with specific subject matter expertise to uncover
actionable insights hidden in an organization’s data.

Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and share documents
that contain live code, equations, visualizations, and narrative text. Uses include data cleaning and
transformation, numerical simulation, statistical modeling, data visualization, machine learning,
and much more.
Jupyter has support for over 40 different programming languages and Python is one of them.
Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook
itself.

Procedure:

PIP is a package management system used to install and manage software


packages/libraries written in Python. These files are stored in a large “on-line repository” termed
as Python Package Index (PyPI).pip uses PyPI as the default source for packages and their
dependencies.
3
Installing Jupyter Notebook using pip:

To install Jupyter using pip, we need to first check if pip is updated in our system.
Use the following command to update pip:

python -m pip install --upgrade pip

After updating the pip version, follow the instructions provided below to install Jupyter:

Command to install Jupyter:

python -m pip install jupyter

4
Finished Installation:

Use the following command to launch Jupyter using command-line:

jupyter notebook

5
Launching Jupyter Notebook

6
Click New and select python 3(pykernal) and type the followingprogram.
Click run to execute the program.
Running the Python program:
Program to find the area of a triangle
# Python Program to
find the area of triangle
a=5
b=6
c=7
# calculate the
semi-perimeters = (a + b +
c) /2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)

Output:

7
Problem Description

Scipy is a python library that is useful in solving many mathematical equations and
algorithms. It is designed on the top of Numpy library that gives more extension of finding
scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU
Decomposition, etc. Using its high-level functions will significantly reduce the complexity of the
code and helps in better

Downloading and Installing Scipy:

pip use the below command to install Scipy package on Windows:

pip install Scipy

OUTPUT:

8
Sample python code using Scipy:
Type the program in Jupyter notebook
from scipy import
special=
special.exp10(3) print(a)
b = special.exp2(3)
print(b)
c=
special.sindg(90)
print(c)
d = special.cosdg(45)
print(d)

OUTPUT:

9
Problem Description
Pandas is one of the most popular open-source frameworks available for Python. It is among the fastest
and most easy-to-use libraries for data analysis and manipulation. Pandas dataframes are some of the most
useful data structures available in any library. It has usesin every data-intensive field, including but not
limited to scientific computing, data science, and machine learning. The library does not come included
with a regular install of Python. To use it, you must install the Pandas framework separately.

Installing Pandas on Windows


There are two ways of installing Pandas on Windows.
Method #1: Installing with pip

It is a package installation manager that makes installing Python libraries and frameworks
straightforward.

As long as you have a newer version of Python installed (> Python 3.4), pip will be installed on
your computer along with Python by default. However, if you’re using an older version of
Python, you will need to install pip on yourcomputer before installing Pandas.

Step #1: Launch Command Prompt

Press the Windows key on your keyboard or click on the Start button to open the start menu.
Type cmd, and the Command Prompt app should appear as a listing in the start menu.

Step #2: Enter the Required Command

After you launch the command prompt, the next step in the process is to type in the required
command to initialize pip installation.

Enter the command

pip install pandas

on the terminal. This should launch the pip installer. The required files will be downloaded, and
Pandas will be ready to run on your computer.

Problem Description
Pandas is one of the most popular open-source frameworks available for Python. It is among the fastest and
most easy-to-use libraries for data analysis and manipulation. Pandas dataframes are some of the most useful
data structures available in any library. It has usesin every data-intensive field, including but not limited to
scientific computing, data science, and machine learning. The library does not come included with a regular
install of Python.
10
To use it, you must installthe Pandas framework separately.

Installing Pandas on Windows

There are two ways of installing Pandas on Windows.

Method #1: Installing with pip

It is a package installation manager that makes installing Python libraries and frameworks
straightforward.

As long as you have a newer version of Python installed (> Python 3.4), pip will be installed onyour
computer along with Python by default.

However, if you’re using an older version of Python, you will need to install pip on your
computer before installing Pandas.

Step #1: Launch Command Prompt


Press the Windows key on your keyboard or click on the Start button to open the start menu.Type
cmd, and the Command Prompt app should appear as a listing in the start menu.

Step #2: Enter the Required Command


After you launch the command prompt, the next step in the process is to type in the required
command to initialize pip installation.

Enter the command

pip install pandas

on the terminal. This should launch the pip installer. The required files will be downloaded, andPandas
will be ready to run on your computer.

11
Panda package is successfully installed.
Sample program

// to be typed in Jupyter notebook

import pandas as pd
data = pd.DataFrame({"x1":["y", "x", "y", "x", "x", "y"],
"x2":range(16, 22),
"x3":range(1, 7),
"x4":["a", "b", "c", "d", "e", "f"],
"x5":range(30, 24, - 1)})
print(data)
s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3])
s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])
Data ={'first':s1, 'second':s2, 'third':s3}
dfseries = pd.DataFrame(Data)
print(dfseries)

12
13
Problem Description:
Statsmodels is a popular library in Python that enables us to estimate and analyze
various statistical models. It is built on numeric and scientific libraries like NumPy and SciPy.
Some of the essential features of this package are-
1. It includes various models of linear regression like ordinary least squares, generalizedleast
squares, weighted least squares, etc.
2. It provides some efficient functions for time series analysis.
3. It also has some datasets for examples and testing.
4. Models based on survival analysis are also available.
5. All the statistical tests that we can imagine for data on a large scale are present.

Installing Statsmodels

Check the version of python installed in the PC.

Using Command Prompt


Type 'Command Prompt' on the taskbar's search pane and you'll see its icon. Click on it to openthe
command prompt.
Also, you can directly click on its icon if it is pinned on the taskbar.
1. Once the 'Command Prompt' screen is visible on your screen.
2. Type python -version and click on 'Enter'.
3. The version installed in your system would be displayed in the next line.

Installation of statsmodels

Now for installing statsmodels in our system, Open the Command Prompt, type the
following command and click on 'Enter'.

pip install statsmodels

14
Output

It's time to look have a program in which we will import statsmodels-

Here, we will perform OLS(Ordinary Least Squares) regression, in this technique we will try to
minimize the net sum of squares of difference between the calculated value and observed value.

Program

import statsmodels.api as sm

import pandas

from patsy import dmatrices

df = sm.datasets.get_rdataset("Guerry", "HistData").data vars =

['Department', 'Lottery', 'Literacy', 'Wealth', 'Region']df =

df[vars]

df[-5:]

15
OUTPUT:

INFERENCE:

RESULT:
Thus to download, install and explore the features of NumPy package, Scipy, jupyter, stats models and pandas
package was explored successfully

16
EX.NO:2
WORKING WITH NUMPY ARRAYS

AIM:
Write a python program to show the working of NumPy Arrays in Python.
2a) Use Numpy array to demonstrate basic array characteristics
b) Create Numpy array using list and tuple
c) Apply basic operations (+,_,*./) and find the transpose of the matrix
d) Perform sorting operation with Numpy arrays
ALGORITHIM:
Step 1:start
Step 2: Create array using numpy
Step 3: Access the element in the array
Step 4: Retrieve element using slice operation
Step 5: Compute calculation in the array
Step 6:Stop
Program 1:
Write a python program to demonstrate the basic NumPy array characteristics
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)

17
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT:

B.Array creation:
Program 2:
import numpy as np
# Creating array from list with type float
a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)
# Creating array from tuple
b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)
# Creating a 3X4 array with all zeros
c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)
# Create a constant value array of complex type
d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn array initialized with all 6s.")
print( "Array type is complex:\n", d)
# Create an array with random values
e = np.random.random((2, 2))
print ("\nA random array:\n", e)
18
# Create a sequence of integers
# from 0 to 30 with steps of 5
f = np.arange(0, 30, 5)
print ("\nA sequential array with steps of 5:\n", f)
# Create a sequence of 10 values in range 0 to 5
g = np.linspace(0, 5, 10)
print ("\nA sequential array with 10 values between" "0 and 5:\n", g)
# Reshaping 3X4 array to 2X2X3 array
arr = np.array([[1, 2, 3, 4], [5, 2, 4, 2], [1, 2, 0, 1]])
newarr = arr.reshape(2, 2, 3)
print ("\nOriginal array:\n", arr)
print ("Reshaped array:\n", newarr)
# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()
print ("\nOriginal array:\n", arr)
print ("Fattened array:\n", flarr)

OUTPUT:

19
C.Basic operations:
Program 3:
import numpy as np
a = np.array([1, 2, 5, 3])
# add 1 to every element
print ("Adding 1 to every element:", a+1)
# subtract 3 from each element
print ("Subtracting 3 from each element:", a-3)
# multiply each element by 10
print ("Multiplying each element by 10:", a*10)
# square each element
print ("Squaring each element:", a**2)
# modify existing array a *= 2
print ("Doubled each element of original array:", a)
# transpose of array a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)
OUTPUT:

20
D.Sorting array:
There is a simple np.sort method for sorting NumPy arrays.
import numpy as np
a = np.array([[1, 4, 2], [3, 4, 6], [0, -1, 5]])
# sorted array print ("Array elements in sorted order:\n", np.sort(a, axis = None))
# sort array row-wise
print ("Row-wise sorted array:\n", np.sort(a, axis = 1))
# specify sort algorithm
print ("Column wise sort by applying merge-sort:\n", np.sort(a, axis = 0, kind = 'mergesort'))
# Example to show sorting of structured array
# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), ('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)] 22
# Creating array arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n", np.sort(arr, order = 'name'))
print ("Array sorted by graduation year and then cgpa:\n", np.sort(arr, order = ['grad_year', 'cgpa']))
OUTPUT:

INFERENCE:

RESULTS:
Thus the Python Program to show the working of NumPy arrays is executed successful.
21
EX.NO:3
WORKING WITH DATA FRAMES USING PANDAS

AIM:

To Write a Python Program for working with data frames using pandas.

ALGORITHM:

Step 1: Start
Step 2: Import the pandas modules as pd
Step 3: Declare the array in row and column
Step 4: Call the function inside the data frame
Step 5: Print the data frames
Step 6: Stop

PROGRAM:

Creating a Panda Data Frames


A pandas DataFrame can be created using the following
constructor −pandas
.DataFrame( data, index, columns, dtype, copy)
.
Creating an empty dataframe :
A basic DataFrame, which can be created is an Empty Dataframe. An Empty Dataframe is created just by calling a
dataframe constructor.
Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.
Creating dataframe from dict of ndarray/lists:
To create dataframe from dict of narray/list, all the narray must be of same length. If index is
passed then the length index should be equal to the length of arrays. If no index is passed, then
by default, index will be range(n) where n is the array length.
Iterating over rows :
In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These
three function will help in iteration over rows.

import pandas as pd
# Calling DataFrame constructor
print("Empty dataframe")
df = pd.DataFrame()
print(df)
print("Dataframe creation using List")
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
22
df = pd.DataFrame(lst)
print(df)
# initialise data of lists.
Data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
# Create dataframe
df = pd.DataFrame(Data)
# Print the output.
25
print(df)
print("Create dataframe from dictionoary of lists")
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'Degree': ["MBA", "BCA", "M.Tech", "MBA"],
'Score':[90, 40, 80, 98]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print(df)
# iterating over rows using iterrows() function
for i, j in df.iterrows():
print(i, j)
print()
OUTPUT
Empty dataframe
Empty DataFrame
Columns: []
Index: []
Dataframe creation using Lis

OUTPUT:

23
INFERENCE:

RESULT:
Thus the Python Program for working with data frames using pandas has been executed successfully.

24
EX.NO:4 READING DATA FROM TEXT FILES, EXCELAND THE WEB AND EXPLORING
VARIOUS COMMANDS FOR DOING DESCRIPTIVE ANALYTICS ON THE IRIS DATA
SET

AIM:
Reading data from text files and exploring various commands for doing descriptive analytics on the Iris data set.

What is Exploratory Data Analysis?

Exploratory Data Analysis (EDA) is a technique to analyze data using some visual Techniques. With this technique,
we can get detailed information about the statistical summary of the data. We will also be able to deal with the
duplicates values, outliers, and also see some trends or patterns present in the dataset.

Iris Dataset
If you are from a data science background you all must be familiar with the Iris Dataset. If you are not then don’t
worry we will discuss this here.
Iris Dataset is considered as the Hello World for data science. It contains five columns namely – Petal Length,
Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a flowering plant, the researchers have measured
various features of the different iris flowers and recorded them
digitally. https://www.geeksforgeeks.org/exploratory-data-analysis-on-iris-dataset/

Program 1 :
To read a csv fie
import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows df.head()

OUTPUT:

25
Checking Duplicates
Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps in removing
duplicates from the data frame.
Program 2:
data = df.drop_duplicates(subset ="Species",)
data

INFERENCE:

RESULTS:
Thus reading data from text files and exploring various commands for doing descriptive analytics on the Iris data set
was explored successfully.

26
EX.NO:5
BASIC PLOTS USING MATPLOTLIB

AIM:
To Write a Python Program for working with Basic Plots Using Matplolib

ALGORITHM:

Step 1: Start
Step 2: Import the pyplot in matplotlib modules as plt
Step 3: Declare the array in row and column
Step 4: Give the necessary x and y plot values
Step 5: Print the basic plots
Step 6: Stop

PROGRAM:
Program:3a
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
# plotting the points
plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('My first graph!')
# function to show the plot
plt.show()
Output:

27
Program:3b
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is
# for red
plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))
# naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command
ax = plt.gca()
# get command over the individual
# boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# set the range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which
# the x-axis set the marks
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis
# set the marks
plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color
# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph
plt.title('All Features Discussed')
plt.show()

28
Output:

Program:4c
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
# use fig whenever u want the
# output in a new window also
# specify the window size you
# want ans to be displayed
fig = plt.figure(figsize =(10, 10))
# creating multiple plots in a
# single plot
sub1 = plt.subplot(2, 2, 1)
sub2 = plt.subplot(2, 2, 2)
sub3 = plt.subplot(2, 2, 3)
sub4 = plt.subplot(2, 2, 4)
sub1.plot(a, 'sb')
# sets how the display subplot
# x axis values advances by 1
# within the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')
sub2.plot(b, 'or')
# sets how the display subplot x axis
# values advances by 2 within the
29
# specified range
sub2.set_xticks(list(range(0, 10, 2)))
sub2.set_title('2nd Rep')
# can directly pass a list in the plot
# function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')
sub4.plot(c, 'Dm')
# similarly we can set the ticks for
# the y-axis range(start(inclusive),
# end(exclusive), step)
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')
# without writing plt.show() no plot
# will be visible
plt.show()
OUTPUT:

30
INFERENCE:

RESULT:

Thus a Python Program for working with Basic Plots Using Matplotlib was executed successfully.

31
Ex.No.6 FREQUENCY DISTRIBUTIONS, AVERAGES,VARIABILITY

AIM:

To Write a Python Program for working with Frequency Distributions, Averages, Variability

ALGORITHM:

Step 1: Import the necessary library such as numpy.


Step 2: Calculate the average for the given data using the function np.average()
Step 3: Calculate the frequency count for the state using the function np.std()
Step 4: Calculate the variance for the data using np.var()
Step 5: Calculate the averages,variances and standard deviation
Step 6:Stop

PROGRAM:

FREQUENCY

from nltk.tokenize import word_tokenize


from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist]
print("Pairs\n" + str(zip(token, wordfreq)))

OUTPUT:

32
AVERAGES
#Method Using Numpy Average() Function
weighted_avg_m3 = round(average( df['salary_p_year'], weights = df['employees_number']),2)
weighted_avg_m3

OUTPUT:

VARIABILITY:
# Python code to demonstrate variance()
# function on varying range of data-types
# importing statistics module
from statistics import variance
# importing fractions as parameter values
from fractions import Fraction as fr
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

33
OUTPUT:

INFERENCE:

RESULT:
Thus the Python Program for working with for working with Frequency Distributions, Averages,
Variability has been executed successfully

34
EX.NO.7
NORMAL CURVES, CORRELATION AND SCATTER PLOTS,
CORRELATION COEFFICIENT

AIM:

To Write a Python Program for Normal Curves, Correlation and Scatter Plots, Correlation Coefficient

ALGORITHM:

Step 1: Start the Program


Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

PROGRAM:
NORMAL CURVE
# import required libraries
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

# Creating the distribution


data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )

#Visualizing the distribution


sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density'

35
OUTPUT:

SCATTERPLOT AND CORRELATIONS


# Data
x-pp random randn(100)
yl=x*5+9
y2=-5°x
y3=no_random.randn(100)
#Plot
plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})
plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)})
plt scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)})
plt scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)})
# Plot
plt titlef('Scatterplot and Correlations')
plt(legend)
plt(show

36
OUTPUT:

OUTPUT:

CORRELATION COEFFICIENT
# Python Program to find correlation coefficient.
import math

# function that returns correlation coefficient.


def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0

37
squareSum_Y = 0

i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.


sum_Y = sum_Y + Y[i]

# sum of X[i] * Y[i].


sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.


squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]

i=i+1

# use formula for calculating correlation


# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array.


n = len(X)

# Function call to correlationCoefficient.


print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

OUTPUT:

38
INFERENCE:

RESULT:
Thus, the Python Program for Normal Curves, Correlation And Scatter Plots, Correlation
Coefficient has been executed successfully.

39
EX.NO.8
REGRESSION

AIM:

To Write a Python Program for Regression concept.

ALGORITHM

Step 1: Start the Program


Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process

PROGRAM:
Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
Program:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
40
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
OUTPUT:

41
INFERENCE:

RESULT:
Thus the Python Program for Regression has been executed successfull

42
lOMoAR cPSD| 44257564

EX.NO.9
Z-TEST

AIM:
To Write a Python Program for z-test concept.

ALGORITHM:
Step 1: Evaluate the data distribution.
Step 2: Formulate Hypothesis statement symbolically
Step 3: Define the level of significance (alpha)
Step 4: Calculate Z test statistic or Z score.
Step 5: Derive P-value for the Z score calculated.
Step 6: Make decision:
Step 6.1: P-Value <= alpha, then we reject H0.
Step 6.2: If P-Value > alpha, Fail to reject H0

PROGRAM:

# imports
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15


# similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha = 0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq

# print mean and sd


print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data,


in the value parameter

# we passed mean value in the null hypothesis, in alternative


hypothesis we check whether the
# mean is larger
ztest_Score,p_value=ztest(data,value=null_mean,alternative='la
rger')
# the function outputs a p_value and z-score corresponding to
that value, we compare the
# p-value with alpha, if it is greater than alpha then we do
not null hypothesis
# else we reject it.
if(p_value < alpha):
print("Reject Null Hypothesis")
Downloaded by Ajay Sekar (ajaysekar2004@gmail.com)
lOMoARcPSD|44257564
lOMoAR cPSD| 44257564

L.12 Fundamentals of Data Science


else:
print("Fail to Reject NUll Hypothesis")

OUTPUT:

INFERENCE:

RESULT:
Thus, the Python Program for Z TEST has been executed successfully.
lOMoAR cPSD| 44257564

EX.NO.10
T-TEST

AIM:
To Write a Python Program for T-test concept.

ALGORITHM:
Step 1: Create some dummy age data for the population of voters in the entire country
Step 2: Create Sample of voters in Minnesota and test the whether the average age of voters
Minnesota differs from the population
Step 3: Conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis
that the sample comes from the same distribution as the population.
Step 4: If the t-statistic lies outside the quantiles of the t-distribution corresponding to our confidence
level and degrees of freedom, we reject the null hypothesis.
Step 5: Calculate the chances of seeing a result as extreme as the one being observed (known as the
p-value) by passing the t-statistic in as the quantile to the stats.t.cdf() function

PROGRAM:

# Importing the required libraries and packages


import numpy as np
from scipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1
x = np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1
y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation
var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval))
print("p = " + str(2 * pval)
lOMoAR cPSD| 44257564

OUTPUT:

INFERENCE:

RESULT:
Thus the Python Program for T TEST has been executed successfully
lOMoAR cPSD| 44257564

EX.NO.11
ANOVA

AIM:
To Write a Python Program for ANOVA.

ALGORITHM:

Step 1: Input the values


Step 2: To find the null hypothesis or alternate hypothesis is acceptable or not.
Step 3: Rows are grouped according to their value in the category column.
Step 4: The total mean value of the value column is computed.
Step 5: The mean within each group is computed.
Step 6: The difference between each value and the mean value for the group is
calculated and
squared.
Step 7: Calculate the F critical value and find the acceptance of the hypothesis

PROGRAM:
# Installing the package install.packages("dplyr") # Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis # H0 = mu = mu01
= mu02 (There is no difference
# between average displacement for different gear) # H1 = Not all means are
equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05 # Step 4: Compare
test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis
lOMoAR cPSD| 44257564

OUTPUT:

INFERENCE:

RESULT:
Thus the Python Program for ANOVA has been executed successfully
lOMoAR cPSD| 44257564

EX.NO.12
BUILDING AND VALIDATING LINEAR MODELS

AIM:
To Write a Python Program to build and validate linear models

ALGORITHM:
Step1: Consider a set of values x, y.
Step2: Take the linear set of equation y = a+bx.
Step3: Computer value of a, b with respect to the given values, b = nΣxy − (Σx) (Σy) /nΣx2−(Σx)2,
a = Σy−b (Σx)n.
Step4: Implement the value of a, b in the equation y = a+ bx.
Step5: Regress the value of y for any x.

PROGRAM:
# Importing the necessary libraries import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns from
sklearn.datasets import load_boston
sns.set(style=”ticks”,color_codes=True) plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston()
You can check those keys with the following code. print(boston.keys())
The output will be as follow:
dict_keys([‘data’, ‘target’, ‘feature_names’, ‘DESCR’, ‘filename’])
print(boston.DESCR)

You will find these details in output: Attribute Information (in order):

— CRIM per capita crime rate by town

— ZN proportion of residential land zoned for lots over 25,000 sq.ft.

— INDUS proportion of non-retail business acres per town

— CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

— NOX nitric oxides concentration (parts per 10 million)

— RM average number of rooms per dwelling

— AGE proportion of owner-occupied units built prior to 1940

— DIS weighted distances to five Boston employment centres

— RAD index of accessibility to radial highways


lOMoAR cPSD| 44257564

— TAX full-value property-tax rate per $10,000


— PTRATIO pupil-teacher ratio by town
— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population

— MEDV Median value of owner-occupied homes in $1000’s :Missing


Attribute Values: None —
df=pd.DataFrame(boston.data,columns=boston.feature_names) df.head()
# print the columns present in the dataset print(df.columns)
# print the top 5 rows in the dataset print(df.head())
OUTPUT:

First five records from data set


#plotting heatmap for overall data setsns.heatmap(df.corr(), square=True, cmap=’RdYlGn’)

Heat map of overall data set


So let’s plot a regression plot to see the correlation between RM and MEDV. sns.lmplot(x = ‘RM’,
y = ‘MEDV’, data = df)

Regression plot with RM and MEDV


lOMoAR cPSD| 44257564

INFERENCE:

RESULT:
Thus the Python Program for to building and validating linear models has been
executed successfully
lOMoAR cPSD| 44257564

EX.NO.13
BUILDING AND VALIDATING LOGISTIC MODELS

AIM:
To Write a Python Program to build and validate logistic models

ALGORITHM:
Step1: Initialize the variables
Step2: Set the Data frame
Step3: Spilt data set into training and testing.
Step4: Fit the data into logistic regression function.
Step5: Predict the test data set.
Step6: Print the results.

PROGRAM:
Building the Logistic Regression model: # importing libraries import
statsmodels.api as sm
import pandas as pd
# loading the training dataset
df = pd.read_csv(’logit_train1.csv’, index_col = 0)
# defining the dependent and independent variables
Xtrain = df[[’gmat’, ’gpa’, ’work_experience’]]
ytrain =
df[[’admitted’]]
# building the model and fitting the data log
_reg = sm.Logit(ytrain, Xtrain).fit()

OUTPUT:

52
lOMoAR cPSD| 44257564

Predicting on New Data :

# loading the testing dataset


df = pd.read_csv(’logit_test1.csv’, index_col = 0) # defining the dependent and
independent variables Xtest = df[[’gmat’, ’gpa’, ’work_experience’]] ytest =
df[’admitted’]
# performing predictions on the test dataset yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of y print(’Actual values’,
list(ytest.values)) print(’Predictions :’, prediction)

OUTPUT:

Testing the accuracy of the model :

from sklearn.metrics
import (confusion_matrix, accuracy_score)
# confusion matrix
cm = confusion_matrix(ytest, prediction)
print ("Confusion Matrix : \n", cm)
# accuracy score of the model
print(’Test accuracy = accuracy_score(ytest, prediction))

OUTPUT:

53
lOMoAR cPSD| 44257564

INFERENCE:

RESULT:
Thus the Python Program for Normal Curves, Correlation And Scatter Plots, Correlation
Coefficient has been executed successfully

54
lOMoAR cPSD| 44257564

EX.NO.14
TIME SERIES ANALYSIS

AIM:
To Write a Python Program for Time Series Analysis

ALGORITHM:
Step1: Loading time series dataset correctly in Pandas
Step2: Indexing in Time-Series Data
Step4: Time-Resampling using Pandas
Step5: Rolling Time Series
Step6: Plotting Time-series Data using Pandas

PROGRAM:

import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use(’fivethirtyeight’)
import pandas as pd
import statsmodels.api as sm
import matplotlibmatplotlib.rcParams[’axes.labelsize’] = 14
matplotlib.rcParams[’xtick.labelsize’] = 12
matplotlib.rcParams[’ytick.labelsize’] = 12
matplotlib.rcParams[’text.color’] = ’k’
We start from time series analysis and forecasting for furniture sales. df=pd.read_excel("Superstore.xls")
furniture = df.loc[df[’Category’] == ’Furniture’]
A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max() Timestamp(‘2014–01–
06 00:00:00’),
Timestamp(‘2017–12–30
00:00:00’)

Data Preprocessing
This step includes removing columns we do not need, check missing values,
aggregate sales by date and so on.
cols = [’Row ID’, ’Order ID’, ’Ship Date’, ’Ship Mode’, ’Customer ID’, ’Customer

55
lOMoAR cPSD| 44257564

Name’, ’Segment’, ’Country’, ’City’, ’State’, ’Postal Code’, ’Region’, ’Product ID’, ’Category’, ’Sub-
Category’, ’Product Name’, ’Quantity’, ’Discount’, ’Profit’]
furniture.drop(cols,axis=1,inplace=True) furniture=furniture.sort_
values(’Order Date’)furniture.isnull().sum()
furniture=furniture.groupby(’OrderDate’)[’Sales’].sum().reset_ index()

Order Date 0

Sales dtype: 0

int64

Figure 1

Indexing with Time Series Data


furniture=furniture.set_index(’OrderDate’) furniture.index

Figure 2
We will use the averages daily sales value for that month instead, and we are
using the start of each month as the timestamp.
y = furniture [’Sales’].resample(’MS’).mean() Have a quick peek 2017 furniture
sales data. y[’2017’:]

56
lOMoAR cPSD| 44257564

Figure 3

Visualizing Furniture Sales Time Series Data


y.plot (figsize=(15,6)) plt.show()

57
lOMoAR cPSD| 44257564

INFERENCE:

RESULT:
Thus the Python Program for Time Series Analysis has been executed successfully

58

You might also like