Dsa Lab Manual
Dsa Lab Manual
9 Z-TEST CO3
10 T-TEST CO4
11 ANOVA CO4
0
EX.NO.1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF NUMPY,
SCIPY, JUPYTER, STATS MODELS AND PANDAS PACKAGES.
Aim:
To download, install and explore the features of NumPy package, Scipy, jupyter, stats models and pandas
package
Problem Description
Python is an open-source object-oriented language. It has many features of which one is the wide
range of external packages. There are a lot of packages for installation and use for expanding
functionalities. These packages are a repository of functions in python script. NumPy is one such
package to ease array computations. To install all these python packages we use the pip- package
installer. Pip is automatically installed along with Python. We can then use pip in the command
line to install packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python programming
language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning tools for
working with arrays.
Prerequisites
● Access to a terminal window/command line
● A user account with do privileges
● Python installed on your system
Downloading and installing Numpy:
Python NumPy is a general-purpose array processing package that provides tools
for handling n-dimensional arrays. It provides various computing tools such as
comprehensive mathematical functions, linear algebra routines. Use the below command
to install NumPy:
1
OUTPUT:
2
OUTPUT:
Data Science:
Data science combines math and statistics, specialized programming, advanced analytics, artificial
intelligence (AI), and machine learning with specific subject matter expertise to uncover
actionable insights hidden in an organization’s data.
Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and share documents
that contain live code, equations, visualizations, and narrative text. Uses include data cleaning and
transformation, numerical simulation, statistical modeling, data visualization, machine learning,
and much more.
Jupyter has support for over 40 different programming languages and Python is one of them.
Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook
itself.
Procedure:
To install Jupyter using pip, we need to first check if pip is updated in our system.
Use the following command to update pip:
After updating the pip version, follow the instructions provided below to install Jupyter:
4
Finished Installation:
jupyter notebook
5
Launching Jupyter Notebook
6
Click New and select python 3(pykernal) and type the followingprogram.
Click run to execute the program.
Running the Python program:
Program to find the area of a triangle
# Python Program to
find the area of triangle
a=5
b=6
c=7
# calculate the
semi-perimeters = (a + b +
c) /2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Output:
7
Problem Description
Scipy is a python library that is useful in solving many mathematical equations and
algorithms. It is designed on the top of Numpy library that gives more extension of finding
scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU
Decomposition, etc. Using its high-level functions will significantly reduce the complexity of the
code and helps in better
OUTPUT:
8
Sample python code using Scipy:
Type the program in Jupyter notebook
from scipy import
special=
special.exp10(3) print(a)
b = special.exp2(3)
print(b)
c=
special.sindg(90)
print(c)
d = special.cosdg(45)
print(d)
OUTPUT:
9
Problem Description
Pandas is one of the most popular open-source frameworks available for Python. It is among the fastest
and most easy-to-use libraries for data analysis and manipulation. Pandas dataframes are some of the most
useful data structures available in any library. It has usesin every data-intensive field, including but not
limited to scientific computing, data science, and machine learning. The library does not come included
with a regular install of Python. To use it, you must install the Pandas framework separately.
It is a package installation manager that makes installing Python libraries and frameworks
straightforward.
As long as you have a newer version of Python installed (> Python 3.4), pip will be installed on
your computer along with Python by default. However, if you’re using an older version of
Python, you will need to install pip on yourcomputer before installing Pandas.
Press the Windows key on your keyboard or click on the Start button to open the start menu.
Type cmd, and the Command Prompt app should appear as a listing in the start menu.
After you launch the command prompt, the next step in the process is to type in the required
command to initialize pip installation.
on the terminal. This should launch the pip installer. The required files will be downloaded, and
Pandas will be ready to run on your computer.
Problem Description
Pandas is one of the most popular open-source frameworks available for Python. It is among the fastest and
most easy-to-use libraries for data analysis and manipulation. Pandas dataframes are some of the most useful
data structures available in any library. It has usesin every data-intensive field, including but not limited to
scientific computing, data science, and machine learning. The library does not come included with a regular
install of Python.
10
To use it, you must installthe Pandas framework separately.
It is a package installation manager that makes installing Python libraries and frameworks
straightforward.
As long as you have a newer version of Python installed (> Python 3.4), pip will be installed onyour
computer along with Python by default.
However, if you’re using an older version of Python, you will need to install pip on your
computer before installing Pandas.
on the terminal. This should launch the pip installer. The required files will be downloaded, andPandas
will be ready to run on your computer.
11
Panda package is successfully installed.
Sample program
import pandas as pd
data = pd.DataFrame({"x1":["y", "x", "y", "x", "x", "y"],
"x2":range(16, 22),
"x3":range(1, 7),
"x4":["a", "b", "c", "d", "e", "f"],
"x5":range(30, 24, - 1)})
print(data)
s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3])
s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])
Data ={'first':s1, 'second':s2, 'third':s3}
dfseries = pd.DataFrame(Data)
print(dfseries)
12
13
Problem Description:
Statsmodels is a popular library in Python that enables us to estimate and analyze
various statistical models. It is built on numeric and scientific libraries like NumPy and SciPy.
Some of the essential features of this package are-
1. It includes various models of linear regression like ordinary least squares, generalizedleast
squares, weighted least squares, etc.
2. It provides some efficient functions for time series analysis.
3. It also has some datasets for examples and testing.
4. Models based on survival analysis are also available.
5. All the statistical tests that we can imagine for data on a large scale are present.
Installing Statsmodels
Installation of statsmodels
Now for installing statsmodels in our system, Open the Command Prompt, type the
following command and click on 'Enter'.
14
Output
Here, we will perform OLS(Ordinary Least Squares) regression, in this technique we will try to
minimize the net sum of squares of difference between the calculated value and observed value.
Program
import statsmodels.api as sm
import pandas
df[vars]
df[-5:]
15
OUTPUT:
INFERENCE:
RESULT:
Thus to download, install and explore the features of NumPy package, Scipy, jupyter, stats models and pandas
package was explored successfully
16
EX.NO:2
WORKING WITH NUMPY ARRAYS
AIM:
Write a python program to show the working of NumPy Arrays in Python.
2a) Use Numpy array to demonstrate basic array characteristics
b) Create Numpy array using list and tuple
c) Apply basic operations (+,_,*./) and find the transpose of the matrix
d) Perform sorting operation with Numpy arrays
ALGORITHIM:
Step 1:start
Step 2: Create array using numpy
Step 3: Access the element in the array
Step 4: Retrieve element using slice operation
Step 5: Compute calculation in the array
Step 6:Stop
Program 1:
Write a python program to demonstrate the basic NumPy array characteristics
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
17
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT:
B.Array creation:
Program 2:
import numpy as np
# Creating array from list with type float
a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)
# Creating array from tuple
b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)
# Creating a 3X4 array with all zeros
c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)
# Create a constant value array of complex type
d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn array initialized with all 6s.")
print( "Array type is complex:\n", d)
# Create an array with random values
e = np.random.random((2, 2))
print ("\nA random array:\n", e)
18
# Create a sequence of integers
# from 0 to 30 with steps of 5
f = np.arange(0, 30, 5)
print ("\nA sequential array with steps of 5:\n", f)
# Create a sequence of 10 values in range 0 to 5
g = np.linspace(0, 5, 10)
print ("\nA sequential array with 10 values between" "0 and 5:\n", g)
# Reshaping 3X4 array to 2X2X3 array
arr = np.array([[1, 2, 3, 4], [5, 2, 4, 2], [1, 2, 0, 1]])
newarr = arr.reshape(2, 2, 3)
print ("\nOriginal array:\n", arr)
print ("Reshaped array:\n", newarr)
# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()
print ("\nOriginal array:\n", arr)
print ("Fattened array:\n", flarr)
OUTPUT:
19
C.Basic operations:
Program 3:
import numpy as np
a = np.array([1, 2, 5, 3])
# add 1 to every element
print ("Adding 1 to every element:", a+1)
# subtract 3 from each element
print ("Subtracting 3 from each element:", a-3)
# multiply each element by 10
print ("Multiplying each element by 10:", a*10)
# square each element
print ("Squaring each element:", a**2)
# modify existing array a *= 2
print ("Doubled each element of original array:", a)
# transpose of array a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)
OUTPUT:
20
D.Sorting array:
There is a simple np.sort method for sorting NumPy arrays.
import numpy as np
a = np.array([[1, 4, 2], [3, 4, 6], [0, -1, 5]])
# sorted array print ("Array elements in sorted order:\n", np.sort(a, axis = None))
# sort array row-wise
print ("Row-wise sorted array:\n", np.sort(a, axis = 1))
# specify sort algorithm
print ("Column wise sort by applying merge-sort:\n", np.sort(a, axis = 0, kind = 'mergesort'))
# Example to show sorting of structured array
# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), ('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)] 22
# Creating array arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n", np.sort(arr, order = 'name'))
print ("Array sorted by graduation year and then cgpa:\n", np.sort(arr, order = ['grad_year', 'cgpa']))
OUTPUT:
INFERENCE:
RESULTS:
Thus the Python Program to show the working of NumPy arrays is executed successful.
21
EX.NO:3
WORKING WITH DATA FRAMES USING PANDAS
AIM:
To Write a Python Program for working with data frames using pandas.
ALGORITHM:
Step 1: Start
Step 2: Import the pandas modules as pd
Step 3: Declare the array in row and column
Step 4: Call the function inside the data frame
Step 5: Print the data frames
Step 6: Stop
PROGRAM:
import pandas as pd
# Calling DataFrame constructor
print("Empty dataframe")
df = pd.DataFrame()
print(df)
print("Dataframe creation using List")
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
22
df = pd.DataFrame(lst)
print(df)
# initialise data of lists.
Data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
# Create dataframe
df = pd.DataFrame(Data)
# Print the output.
25
print(df)
print("Create dataframe from dictionoary of lists")
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'Degree': ["MBA", "BCA", "M.Tech", "MBA"],
'Score':[90, 40, 80, 98]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print(df)
# iterating over rows using iterrows() function
for i, j in df.iterrows():
print(i, j)
print()
OUTPUT
Empty dataframe
Empty DataFrame
Columns: []
Index: []
Dataframe creation using Lis
OUTPUT:
23
INFERENCE:
RESULT:
Thus the Python Program for working with data frames using pandas has been executed successfully.
24
EX.NO:4 READING DATA FROM TEXT FILES, EXCELAND THE WEB AND EXPLORING
VARIOUS COMMANDS FOR DOING DESCRIPTIVE ANALYTICS ON THE IRIS DATA
SET
AIM:
Reading data from text files and exploring various commands for doing descriptive analytics on the Iris data set.
Exploratory Data Analysis (EDA) is a technique to analyze data using some visual Techniques. With this technique,
we can get detailed information about the statistical summary of the data. We will also be able to deal with the
duplicates values, outliers, and also see some trends or patterns present in the dataset.
Iris Dataset
If you are from a data science background you all must be familiar with the Iris Dataset. If you are not then don’t
worry we will discuss this here.
Iris Dataset is considered as the Hello World for data science. It contains five columns namely – Petal Length,
Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a flowering plant, the researchers have measured
various features of the different iris flowers and recorded them
digitally. https://www.geeksforgeeks.org/exploratory-data-analysis-on-iris-dataset/
Program 1 :
To read a csv fie
import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows df.head()
OUTPUT:
25
Checking Duplicates
Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps in removing
duplicates from the data frame.
Program 2:
data = df.drop_duplicates(subset ="Species",)
data
INFERENCE:
RESULTS:
Thus reading data from text files and exploring various commands for doing descriptive analytics on the Iris data set
was explored successfully.
26
EX.NO:5
BASIC PLOTS USING MATPLOTLIB
AIM:
To Write a Python Program for working with Basic Plots Using Matplolib
ALGORITHM:
Step 1: Start
Step 2: Import the pyplot in matplotlib modules as plt
Step 3: Declare the array in row and column
Step 4: Give the necessary x and y plot values
Step 5: Print the basic plots
Step 6: Stop
PROGRAM:
Program:3a
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
# plotting the points
plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('My first graph!')
# function to show the plot
plt.show()
Output:
27
Program:3b
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is
# for red
plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))
# naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command
ax = plt.gca()
# get command over the individual
# boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# set the range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which
# the x-axis set the marks
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis
# set the marks
plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color
# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph
plt.title('All Features Discussed')
plt.show()
28
Output:
Program:4c
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
# use fig whenever u want the
# output in a new window also
# specify the window size you
# want ans to be displayed
fig = plt.figure(figsize =(10, 10))
# creating multiple plots in a
# single plot
sub1 = plt.subplot(2, 2, 1)
sub2 = plt.subplot(2, 2, 2)
sub3 = plt.subplot(2, 2, 3)
sub4 = plt.subplot(2, 2, 4)
sub1.plot(a, 'sb')
# sets how the display subplot
# x axis values advances by 1
# within the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')
sub2.plot(b, 'or')
# sets how the display subplot x axis
# values advances by 2 within the
29
# specified range
sub2.set_xticks(list(range(0, 10, 2)))
sub2.set_title('2nd Rep')
# can directly pass a list in the plot
# function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')
sub4.plot(c, 'Dm')
# similarly we can set the ticks for
# the y-axis range(start(inclusive),
# end(exclusive), step)
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')
# without writing plt.show() no plot
# will be visible
plt.show()
OUTPUT:
30
INFERENCE:
RESULT:
Thus a Python Program for working with Basic Plots Using Matplotlib was executed successfully.
31
Ex.No.6 FREQUENCY DISTRIBUTIONS, AVERAGES,VARIABILITY
AIM:
To Write a Python Program for working with Frequency Distributions, Averages, Variability
ALGORITHM:
PROGRAM:
FREQUENCY
OUTPUT:
32
AVERAGES
#Method Using Numpy Average() Function
weighted_avg_m3 = round(average( df['salary_p_year'], weights = df['employees_number']),2)
weighted_avg_m3
OUTPUT:
VARIABILITY:
# Python code to demonstrate variance()
# function on varying range of data-types
# importing statistics module
from statistics import variance
# importing fractions as parameter values
from fractions import Fraction as fr
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
33
OUTPUT:
INFERENCE:
RESULT:
Thus the Python Program for working with for working with Frequency Distributions, Averages,
Variability has been executed successfully
34
EX.NO.7
NORMAL CURVES, CORRELATION AND SCATTER PLOTS,
CORRELATION COEFFICIENT
AIM:
To Write a Python Program for Normal Curves, Correlation and Scatter Plots, Correlation Coefficient
ALGORITHM:
PROGRAM:
NORMAL CURVE
# import required libraries
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
35
OUTPUT:
36
OUTPUT:
OUTPUT:
CORRELATION COEFFICIENT
# Python Program to find correlation coefficient.
import math
37
squareSum_Y = 0
i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]
i=i+1
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
OUTPUT:
38
INFERENCE:
RESULT:
Thus, the Python Program for Normal Curves, Correlation And Scatter Plots, Correlation
Coefficient has been executed successfully.
39
EX.NO.8
REGRESSION
AIM:
ALGORITHM
PROGRAM:
Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
Program:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
40
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
OUTPUT:
41
INFERENCE:
RESULT:
Thus the Python Program for Regression has been executed successfull
42
lOMoAR cPSD| 44257564
EX.NO.9
Z-TEST
AIM:
To Write a Python Program for z-test concept.
ALGORITHM:
Step 1: Evaluate the data distribution.
Step 2: Formulate Hypothesis statement symbolically
Step 3: Define the level of significance (alpha)
Step 4: Calculate Z test statistic or Z score.
Step 5: Derive P-value for the Z score calculated.
Step 6: Make decision:
Step 6.1: P-Value <= alpha, then we reject H0.
Step 6.2: If P-Value > alpha, Fail to reject H0
PROGRAM:
# imports
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
OUTPUT:
INFERENCE:
RESULT:
Thus, the Python Program for Z TEST has been executed successfully.
lOMoAR cPSD| 44257564
EX.NO.10
T-TEST
AIM:
To Write a Python Program for T-test concept.
ALGORITHM:
Step 1: Create some dummy age data for the population of voters in the entire country
Step 2: Create Sample of voters in Minnesota and test the whether the average age of voters
Minnesota differs from the population
Step 3: Conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis
that the sample comes from the same distribution as the population.
Step 4: If the t-statistic lies outside the quantiles of the t-distribution corresponding to our confidence
level and degrees of freedom, we reject the null hypothesis.
Step 5: Calculate the chances of seeing a result as extreme as the one being observed (known as the
p-value) by passing the t-statistic in as the quantile to the stats.t.cdf() function
PROGRAM:
OUTPUT:
INFERENCE:
RESULT:
Thus the Python Program for T TEST has been executed successfully
lOMoAR cPSD| 44257564
EX.NO.11
ANOVA
AIM:
To Write a Python Program for ANOVA.
ALGORITHM:
PROGRAM:
# Installing the package install.packages("dplyr") # Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis # H0 = mu = mu01
= mu02 (There is no difference
# between average displacement for different gear) # H1 = Not all means are
equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05 # Step 4: Compare
test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis
lOMoAR cPSD| 44257564
OUTPUT:
INFERENCE:
RESULT:
Thus the Python Program for ANOVA has been executed successfully
lOMoAR cPSD| 44257564
EX.NO.12
BUILDING AND VALIDATING LINEAR MODELS
AIM:
To Write a Python Program to build and validate linear models
ALGORITHM:
Step1: Consider a set of values x, y.
Step2: Take the linear set of equation y = a+bx.
Step3: Computer value of a, b with respect to the given values, b = nΣxy − (Σx) (Σy) /nΣx2−(Σx)2,
a = Σy−b (Σx)n.
Step4: Implement the value of a, b in the equation y = a+ bx.
Step5: Regress the value of y for any x.
PROGRAM:
# Importing the necessary libraries import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns from
sklearn.datasets import load_boston
sns.set(style=”ticks”,color_codes=True) plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston()
You can check those keys with the following code. print(boston.keys())
The output will be as follow:
dict_keys([‘data’, ‘target’, ‘feature_names’, ‘DESCR’, ‘filename’])
print(boston.DESCR)
You will find these details in output: Attribute Information (in order):
INFERENCE:
RESULT:
Thus the Python Program for to building and validating linear models has been
executed successfully
lOMoAR cPSD| 44257564
EX.NO.13
BUILDING AND VALIDATING LOGISTIC MODELS
AIM:
To Write a Python Program to build and validate logistic models
ALGORITHM:
Step1: Initialize the variables
Step2: Set the Data frame
Step3: Spilt data set into training and testing.
Step4: Fit the data into logistic regression function.
Step5: Predict the test data set.
Step6: Print the results.
PROGRAM:
Building the Logistic Regression model: # importing libraries import
statsmodels.api as sm
import pandas as pd
# loading the training dataset
df = pd.read_csv(’logit_train1.csv’, index_col = 0)
# defining the dependent and independent variables
Xtrain = df[[’gmat’, ’gpa’, ’work_experience’]]
ytrain =
df[[’admitted’]]
# building the model and fitting the data log
_reg = sm.Logit(ytrain, Xtrain).fit()
OUTPUT:
52
lOMoAR cPSD| 44257564
OUTPUT:
from sklearn.metrics
import (confusion_matrix, accuracy_score)
# confusion matrix
cm = confusion_matrix(ytest, prediction)
print ("Confusion Matrix : \n", cm)
# accuracy score of the model
print(’Test accuracy = accuracy_score(ytest, prediction))
OUTPUT:
53
lOMoAR cPSD| 44257564
INFERENCE:
RESULT:
Thus the Python Program for Normal Curves, Correlation And Scatter Plots, Correlation
Coefficient has been executed successfully
54
lOMoAR cPSD| 44257564
EX.NO.14
TIME SERIES ANALYSIS
AIM:
To Write a Python Program for Time Series Analysis
ALGORITHM:
Step1: Loading time series dataset correctly in Pandas
Step2: Indexing in Time-Series Data
Step4: Time-Resampling using Pandas
Step5: Rolling Time Series
Step6: Plotting Time-series Data using Pandas
PROGRAM:
import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use(’fivethirtyeight’)
import pandas as pd
import statsmodels.api as sm
import matplotlibmatplotlib.rcParams[’axes.labelsize’] = 14
matplotlib.rcParams[’xtick.labelsize’] = 12
matplotlib.rcParams[’ytick.labelsize’] = 12
matplotlib.rcParams[’text.color’] = ’k’
We start from time series analysis and forecasting for furniture sales. df=pd.read_excel("Superstore.xls")
furniture = df.loc[df[’Category’] == ’Furniture’]
A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max() Timestamp(‘2014–01–
06 00:00:00’),
Timestamp(‘2017–12–30
00:00:00’)
Data Preprocessing
This step includes removing columns we do not need, check missing values,
aggregate sales by date and so on.
cols = [’Row ID’, ’Order ID’, ’Ship Date’, ’Ship Mode’, ’Customer ID’, ’Customer
55
lOMoAR cPSD| 44257564
Name’, ’Segment’, ’Country’, ’City’, ’State’, ’Postal Code’, ’Region’, ’Product ID’, ’Category’, ’Sub-
Category’, ’Product Name’, ’Quantity’, ’Discount’, ’Profit’]
furniture.drop(cols,axis=1,inplace=True) furniture=furniture.sort_
values(’Order Date’)furniture.isnull().sum()
furniture=furniture.groupby(’OrderDate’)[’Sales’].sum().reset_ index()
Order Date 0
Sales dtype: 0
int64
Figure 1
Figure 2
We will use the averages daily sales value for that month instead, and we are
using the start of each month as the timestamp.
y = furniture [’Sales’].resample(’MS’).mean() Have a quick peek 2017 furniture
sales data. y[’2017’:]
56
lOMoAR cPSD| 44257564
Figure 3
57
lOMoAR cPSD| 44257564
INFERENCE:
RESULT:
Thus the Python Program for Time Series Analysis has been executed successfully
58