KEMBAR78
Data Science Lab Manual | PDF | Regression Analysis | Linear Regression
0% found this document useful (0 votes)
38 views42 pages

Data Science Lab Manual

The document outlines the installation and exploration of various Python packages including NumPy, SciPy, Jupyter, Statsmodels, and Pandas, focusing on their features and functionalities. It provides step-by-step procedures for using Jupyter Notebook and performing basic data analytics, integration, and quantitative techniques with these packages. Additionally, it includes source code examples for array operations, integration, and data manipulation using Pandas.

Uploaded by

chitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views42 pages

Data Science Lab Manual

The document outlines the installation and exploration of various Python packages including NumPy, SciPy, Jupyter, Statsmodels, and Pandas, focusing on their features and functionalities. It provides step-by-step procedures for using Jupyter Notebook and performing basic data analytics, integration, and quantitative techniques with these packages. Additionally, it includes source code examples for array operations, integration, and data manipulation using Pandas.

Uploaded by

chitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

EX.

NO – 1
DATE:
DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF NUMPY, SCIPY,
JUPYTER, STATSMODELS AND PANDAS PACKAGES

AIM:
To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
PROCEDURE:
Jupyter Notebook is an interactive browser-based platform for scientific computing and is widely
used in data science. In addition to providing an interactive coding platform, Jupyter Notebook
supports both code and text cells. The text cells allow Markdown formatting. So Plain text, images,
LaTex math equations can be used to explain project’s workflow.

For example, the following image shows how can write both Markdown and code by specifying
the cell type.

Markdown and Code Cells in Jupyter Notebook

To run a cell, the Run [▶] button can be pressed or press Shift + Enter to run a cell. The
headings and images are rendered after running the cells.
Jupyter Notebook Cells

Installation Using the Anaconda Distribution

It’s recommended to use the Anaconda distribution of Python. In addition to Python, it comes
with several useful data science packages pre-installed. The installation also includes Jupyter
tools like Jupyter Notebook and JupyterLab.

Steps in this installation.

Step 1: Head over to the official website of Anaconda. Then, navigate


to anaconda.com/products/individual. And download the installer corresponding to your
operating system.
Installing the Anaconda Distribution

Step 2: Now, run the installer. Follow the prompts on your screen to complete the installation.
The installation will typically take a few minutes. ⏳

Launch Jupyter Notebook once the installation process is completed.

Step 3: Once installation is completed, launch Anaconda Navigator. From the navigator, click
on the Launch option in the Jupyter Notebook tab, as shown below:

Launching Jupyter Notebook from Anaconda Navigator

Or the Jupyter Notebook shortcut is used to launch, as illustrated below:


Also launch jupyter notebook from the Anaconda Command Prompt.

NUMPY:
▪ Introduces objects for multidimensional arrays and matrices, as well as functions
that allow to easily perform advanced mathematical and statistical operations on
those objects
▪ provides vectorization of mathematical operations on arrays and matrices which
significantly improves the performance
▪ many other python libraries are built on NumPy
SciPy:
▪ collection of algorithms for linear algebra, differential equations, numerical
integration, optimization, statistics and more
▪ part of SciPy Stack
▪ built on NumPy
Pandas:
▪ adds data structures and tools designed to work with table-like data (similar to
Series and Data Frames in R)
▪ provides tools for data manipulation: reshaping, merging, sorting, slicing,
aggregation etc.
▪ allows handling missing data
Statsmodels:
statsmodels is a Python package that provides a complement to scipy for statistical computations
including descriptive statistics and estimation and inference for statistical models.

RESULT:
EX. NO – 1.a
DATE:
PERFORM BASIC DATA ANALYTICS COMMUNICATION
PROCESS WITH NUMPY
AIM

To write a program for construct and perform the basic data analytics communication
process with numpy and other packages

PROCEDURE
STEP 1: Start
STEP 2: Import the numpy package file for our program.

STEP 3: Perform the Array operation using numpy package


STEP 4: Import the numpy package file for our program.
STEP 5: Perform the Array creating operation using numpy package
STEP 6: Import the numpy package file for our program.

STEP 7: Perform the Array indexing operation using numpy package


STEP 8: Import the numpy package file for our program.
STEP 9: Perform the Operations on single Array using numpy package
STEP 10: Import the numpy package file for our program.
STEP 11: Perform the Unary operation using numpy package
STEP 12: Import the numpy package file for our program.
STEP 13: Perform the Binary operation using numpy package
STEP 14: Import the numpy package file for our program.

STEP 15: Perform the Universal functions (ufunc) using numpy package
STEP 16: Import the numpy package file for our program.
STEP 17: Perform the Sorting array: using numpy package
STEP 18: End
SOURCE CODE

A) ARRAY IN NUMPY
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] )
print(arr)
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT

B) ARRAY CREATION
# array creation techniques

import numpy as np
# Creating array from list with type float
a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)
# Creating array from tuple
b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)
# Creating a 3X4 array with all zeros
c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)
# Create a constant value array of complex type
d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn array initialized with all 6s.""Array type is complex:\n", d)
# Create an array with random values
e = np.random.random((2, 2))
print ("\nA random array:\n", e)
# Create a sequence of integers
# from 0 to 30 with steps of 5
f = np.arange(0, 30, 5)
print ("\nA sequential array with steps of 5:\n", f)
# Create a sequence of 10 values in range 0 to 5
g = np.linspace(0, 5, 10)
print ("\nA sequential array with 10 values between""0 and 5:\n", g)
# Reshaping 3X4 array to 2X2X3 array
arr = np.array([[1, 2, 3, 4], [5, 2, 4, 2], [1, 2, 0, 1]])
newarr = arr.reshape(2, 2, 3)
print ("\nOriginal array:\n", arr)
print ("Reshaped array:\n", newarr)
# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()
print ("\nOriginal array:\n", arr)
print ("Fattened array:\n", flarr)

OUTPUT
C) ARRAY INDEXING
# indexing in numpy
import numpy as np

# An exemplar array
arr = np.array([[-1, 2, 0, 4], [4, -0.5, 6, 0], [2.6, 0, 7, 8], [3, -7, 4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
print ("Array with first 2 rows and alternate columns(0 and 2):\n", temp)
# Integer array indexing example
temp = arr[[0, 1, 2, 3], [3, 2, 1, 0]]
print ("\nElements at indices (0, 3), (1, 2), (2, 1),""(3, 0):\n", temp)
# boolean array indexing example
cond = arr > 0
# cond is a boolean array
temp = arr[cond]
print ("\nElements greater than 0:\n", temp)

OUTPUT
D) OPERATIONS ON SINGLE ARRAY
# basic operations on single array

import numpy as np
a = np.array([1, 2, 5, 3])
# add 1 to every element
print ("Adding 1 to every element:", a+1)
# subtract 3 from each element
print ("Subtracting 3 from each element:", a-3)
# multiply each element by 10
print ("Multiplying each element by 10:", a*10)
# square each element
print ("Squaring each element:", a**2)
# modify existing array
a *= 2
print ("Doubled each element of original array:", a)
# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)

OUTPUT

E) UNARY OPERATORS
# unary operators in numpy
import numpy as np
arr = np.array([[1, 5, 6], [4, 7, 2], [3, 1, 9]])
# maximum element of array
print ("Largest element is:", arr.max())
print ("Row-wise maximum elements:", arr.max(axis = 1))
# minimum element of array
print ("Column-wise minimum elements:", arr.min(axis = 0))
# sum of array elements
print ("Sum of all array elements:", arr.sum())
# cumulative sum along each row
print ("Cumulative sum along each row:\n", arr.cumsum(axis = 1))

OUTPUT

F) BINARY OPERATORS
# binary operators in Numpy
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[4, 3], [2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# multiply arrays (elementwise multiplication)
print ("Array multiplication:\n", a*b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))

OUTPUT
G) UNIVERSAL FUNCTIONS (ufunc)
# universal functions in numpy
import numpy as np
# create an array of sine values
a = np.array([0, np.pi/2, np.pi])
print ("Sine values of array elements:", np.sin(a))
# exponential values
a = np.array([0, 1, 2, 3])
print ("Exponent of array elements:", np.exp(a))
# square root of array values
print ("Square root of array elements:", np.sqrt(a))

OUTPUT

H) SORTING ARRAY
# Python program to demonstrate sorting in numpy

import numpy as np
a = np.array([[1, 4, 2], [3, 4, 6], [0, -1, 5]])
# sorted array
print ("Array elements in sorted order:\n", np.sort(a, axis = None))
# sort array row-wise
print ("Row-wise sorted array:\n", np.sort(a, axis = 1))
# specify sort algorithm
print ("Column wise sort by applying merge- sort:\n", np.sort(a, axis = 0, kind =
'mergesort'))
# Example to show sorting of structured array
# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), ('Pankaj', 2008,
7.9), ('Aakash', 2009, 9.0)]
# Creating array
arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n", np.sort(arr, order = 'name'))
print ("Array sorted by graduation year and then cgpa:\n", np.sort(arr, order =
['grad_year', 'cgpa']))

OUTPUT

RESULT

EX. NO – 1.b

DATE:

PERFORM INTEGRATION USING SCIPY SUB PACKAGE

Aim:
To write a program to construct and demonstrate the integration
operations using Scipy Sub packages.
PROCEDURE
STEP 1: Start
STEP 2: Import the scipy package file for our program.
STEP 3: Perform the Integration operations
STEP 4: Perform the Single Integration operation
STEP 5: Perform the double integration operations
STEP 6: End
SOURCE CODE:
Single Integration:

from scipy import integrate


# take f(x) function as f
f = lambda x : (x**2)/2
#single integration with a = 0 & b = 1
integration = integrate.quad(f, 0 , 1)
print(integration)

OUTPUT:

Double Integration:

from scipy import integrate


import numpy as np
#import square root function from math lib
from math import sqrt
# set fuction f(x)
f = lambda x, y : 64 *x*y
# lower limit of second integral
p = lambda x : 0
# upper limit of first integral
q = lambda y : sqrt(1 - 2*y**2)
# perform double integration
integration = integrate.dblquad(f , 0 , 2/4, p, q)
print(integration)

OUTPUT:

RESULT:

EX. NO – 1.c

DATE:
APPLY QUANTITATIVE TECHNIQUE USING
APPROPRIATE PACKAGES IN PYTHON

AIM:
To write a program to construct and demonstrate the
Quantitative
Techniques Using Appropriate Packages in Python.
PROCEDURE

STEP 1: Start
STEP 2: Import the pandas package file for our program.
STEP 3: Download iris_csv.csv file and save it in system.
STEP 4: Read the data(.csv) file
STEP 5: Perform the shape function
STEP 6: Perform the info() function
STEP 7: Perform the head() function
STEP 8: Perform the tail() function
STEP 9: Perform the mean() function
STEP 10: Perform the median() function
STEP 11: Perform the min() function
STEP 12: Perform the max() function
STEP 13: Perform the count() function
STEP 14: Perform the std() function
STEP 15: Perform the corr() function
STEP 16: Perform the describe() function
STEP 17: End

SOURCE CODE:

import numpy as np
import pandas as pd
df = pd.read_csv("C:/Users/mani4/Documents/Python
Scripts/iris_csv.csv")

# Prints number of rows and columns in


dataframe
df.shape
# Index, Datatype and Memory
information
df.info()
# Prints first n rows of the
DataFrame df.head()
# Prints last n rows of the
DataFrame
df.tail()
# Returns the mean of all
columns df.mean()

# Returns the median of each column


df.median()

# Returns the lowest value in each


column df.min()
# Returns the highest value in each column
df.max()

# Returns the number of non-null values in each DataFrame column

df.count()
# Returns the standard deviation of each
column df.std()

# Returns the correlation between columns in a DataFrame


df.corr()
# Summary statistics for numerical
columns df.describe()
RESULT:
Ex2. Working with Numpy arrays
Aim:
Working with different instructions used in Numpy array
Procedure
✓ Do the following function
✓ To print the array print(a)
✓ To print the shape of the array print(a.shape)
✓ To the function return the number of dimensions of an array.
print(a.ndim)
✓ This data type object (dtype) informs us about the layout of the array.
print(a.dtype.name)
✓ This returns the size (in bytes) of each element of a NumPy array
print(a.itemsize)
✓ This is the total number of elements in the ndarray. print(a.size)

Program
import numpy as np
a = np.arange(15).reshape(3, 5)
To print the array
print(a)
To print the shape of the array
print(a.shape)
To the function return the number of dimensions of an array.
print(a.ndim)
This data type object (dtype) informs us about the layout of the array
print(a.dtype.name)
This returns the size (in bytes) of each element of a NumPy array
print(a.itemsize)
This is the total number of elements in the ndarray.
print(a.size)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])


x = np.where(arr%2 == 1)
print(x)
Slice elements from index 1 to index 5 from the following array
print(arr[1:5])
Get third and fourth elements from the following array and add them.
print(arr[2] + arr[3])
Write a NumPy program to convert the values of Centigrade degrees into
Fahrenheit degrees and vice versa. Values are stored into a NumPy array.
import numpy as np
fvalues = [0, 12, 45.21, 34, 99.91, 32]
F = np.array(fvalues)
print("Values in Fahrenheit degrees:")
print(F)
print("Values in Centigrade degrees:")
print(np.round((5*F/9 - 5*32/9),2))
Ex3 : Working with Pandas data frames
Aim
To work with Pandas data frames
Procedure
1. Create a dataset with name, city, age and pyscore.
2. Use .head() to show the first few items and .tail() to show the last few
items.
3. Get the DataFrame’s row labels with .index and its column labels
with .columns
4. Get the data types for each column of a Pandas DataFrame with .dtypes
5. Check the amount of memory used by each column.
6. Accessor .loc[], which you can use to get rows or columns by their labels,
Pandas offers the accessor .iloc[], which retrieves a row or column by its
integer index.
7. Creating a new Series object that represents this new candidate
Program
import pandas as pd
data = {
'name': ['Muthu', 'Anand', 'Ramkumar', 'Roja', 'Robin', 'Rajan', 'Joel'],
'city': ['Chennai', 'Madurai', 'Tirunelveli', 'Saidapet','Tambaram', 'Irukattukott
ai', 'Central Station'],
'age': [41, 28, 33, 34, 38, 31, 37],
'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
}
row_labels = [101, 102, 103, 104, 105, 106, 107]

df is a variable that holds the reference to your Pandas DataFrame


df = pd.DataFrame(data=data, index=row_labels)
df

We can use .head() to show the first few items and .tail() to show the last few
items.
df.head(n=2)
df.tail(n=2)
You can get the DataFrame’s row labels with .index and its column labels
with .columns
df.index
df.columns

We can get the data types for each column of a Pandas DataFrame with .dtypes
df.dtypes

You can even check the amount of memory used by each column
with .memory_usage()
df.memory_usage()

In addition to the accessor .loc[], which you can use to get rows or columns by
their labels, Pandas offers the accessor .iloc[], which retrieves a row or column
by its integer index.
df.loc[101]
df.iloc[0]
We can start by creating a new Series object that represents this new candidate
john = pd.Series(data=['Jovan', 'Medavakkam', 34, 79],index=df.columns, name
=17)
john
df = df.append(john)
df
Ex 4a. Reading data from text files
Aim
Write a python program to Reading data from text files
Procedure
✓ Type the text file with some data and save as filename.txt.
✓ Load the text file using the open() function with write command.
✓ Do the functions like read(), write() commands as needed.
Program
# Program to show various ways to read and
# write data in a file.
file1 = open("myfile.txt","w")
L = ["This is Delhi \n","This is Paris \n","This is London \n"]

# \n is placed to indicate EOL (End of Line)


file1.write("Hello \n")
file1.writelines(L)
file1.close() #to change file access modes

file1 = open("myfile.txt","r+")

print("Output of Read function is ")


print(file1.read())
print()

# seek(n) takes the file handle to the nth


# bite from the beginning.
file1.seek(0)
print( "Output of Readline function is ")
print(file1.readline())
print()

file1.seek(0)

# To show difference between read and readline


print("Output of Read(9) function is ")
print(file1.read(9))
print()

file1.seek(0)

print("Output of Readline(9) function is ")


print(file1.readline(9))

file1.seek(0)
# readlines function
print("Output of Readlines function is ")
print(file1.readlines())
print()
file1.close()
Ex 4b. Reading data from excel files does exploring various commands
Aim
Write a python program to Reading data from excel files
Procedure
• First of all create an excel file with some 10 records(10 rows) and 5
columns with numerical data and save them as filename.xlsx format.
• Reading data from excel files into pandas using Python.
• Exploring the data from excel files in Pandas.
• Using functions to manipulate and reshape the data in Pandas.
To view 5 columns from the top and from the bottom of the data.
The shape() method can be used to view the number of rows and columns.
If any column contains numerical data, we can sort that column using
the sort_values() method in pandas.
Suppose our data is mostly numerical. We can get the statistical information like
mean, max, min, etc. about the data frame using the describe() method.

Program
import pandas as pd
data = pd.read_excel (r'd:\mark.xlsx')
data
df = pd.DataFrame(data, columns= ['Name','CGPA'])
print (df)
data.head()
data.tail()
data.shape
sorted_column = data.sort_values(['Name'], ascending = False)
sorted_column['Name'].head(5)
data.describe()
Ex 4c. Exploring various commands for doing descriptive analytics on the
Iris data set.

Aim

To explore various commands for doing descriptive analytics on the Iris data
set.

Procedure
✓ To understand idea behind Descriptive Statistics.
✓ Load the packages we will need and also the `iris` dataset.
✓ load_iris() loads in an object containing the iris dataset, which I stored in
`iris_obj`.
✓ Basic statistics.
✓ This number is the number of rows in the dataset, and can be obtained
via `count()`.
✓ Mean for every numeric column
✓ Median for every numeric column
✓ variance is a measure of dispersion, roughly the “average” squared
distance of a data point from the mean.
✓ The standard deviation is the square root of the variance and interpreted
as the “average” distance a data point is from the mean.
✓ The maximum and minimum values.

Program Code
import pandas as pd
from pandas import DataFrame
from sklearn.datasets import load_iris
# sklearn.datasetsincludes common example datasets
# A function to load in the iris dataset
iris_obj = load_iris()
# Dataset preview
iris_obj.data
iris = DataFrame(iris_obj.data, columns=iris_obj.feature_nam
es,index=pd.Index([i for i in range(iris_obj.data.shape[0])]
)).join(DataFrame(iris_obj.target, columns=pd.Index(["specie
s"]), index=pd.Index([i for i in range(iris_obj.target.shape
[0])])))
iris # prints iris data

Commands
iris_obj.feature_names
iris.count()
iris.mean()
iris.median()
iris.var()
iris.std()
iris.max()
iris.min()
iris.describe()

Result
Exploring various commands for doing descriptive analytics on the Iris data set
successfully executed.
5. a. Use the diabetes data set from UCI and Pima Indians
Diabetes data set for performing the following:
Univariate analysis: Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis.

Aim:
Analysis the various univariate functions like Frequency, Mean, Median,
Mode, Variance, Standard Deviation, Skewness and Kurtosis on dataset
like Pima Indian diabetes dataset.

Procedure
• Download dataset like Pima Indian diabetes dataset. Save them in
any drive and call them for process.
• The mean() function can be used to calculate mean/average of a
given list of numbers.
• The median() method calculates the median (middle value) of the
given data set.
• The mode of a set of data values is the value that appears most
often.
• The var() method calculates the variance for each column.
• Standard deviation std() is a number that describes how spread out
the values are.
• The skew() method calculates the skew for each column. Skewness
refers to a distortion or asymmetry that deviates from the
symmetrical bell curve, or normal distribution, in a set of data.
• Kurtosis:
• It is also a statistical term and an important characteristic of
frequency distribution. It determines whether a distribution is
heavy-tailed in respect of the normal distribution. It provides
information about the shape of a frequency distribution.
Program:
import pandas as pd
from scipy.stats import kurtosis
import pylab as p
df = pd.read_csv (r'd:\\diabetes.csv')
print (df)
df1 = pd.DataFrame(df, columns= ['Age','Glucose'])
print (df1)
df1.mean()
df1.median()
df1.mode()
print(df1.var())
df1.std()
print(df1.skew())
print(kurtosis(df, axis=0, bias=True))

Dataset download link


https://github.com/npradaschnor/Pima-Indians-Diabetes-
Dataset/blob/master/Pima%20Indians%20Diabetes%20Dataset.ipynb

Result:
The various univariate functions like Frequency, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis on dataset Pima
Indian diabetes dataset are successfully executed.
5 b. Linear Regression and Logistic Regression with the Diabetes Dataset
Using Python Machine Learning
Aim
In this experiment we use the diabetes dataset from sklearn and then we need to
implement the Linear Regression over this:

Procedure
Load sklearn Libraries.
Load Data
Load the diabetes dataset
Split Dataset
Creating Model Linear Regression and Logistic Regression
Make predictions using the testing set
Finding Coefficient And Mean Square Error

Program
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

#To calculate accuracy measures and confusion matrix


from sklearn import metrics

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets


diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets


diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]

# Create linear regression object


regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set


diabetes_y_pred = regr.predict(diabetes_X_test)

# Create Logistic regression object


Logistic_model = LogisticRegression()
Logistic_model.fit(diabetes_X_train, diabetes_y_train)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(diabetes_y_test, diabetes_y_pred))

y_predict = Logistic_model.predict(diabetes_X_train)
#print("Y predict/hat ", y_predict)
y_predict

Output
Coefficients:
[938.23786125]
Mean squared error: 2548.07
Coefficient of determination: 0.47
5 c. Use the diabetes data set from UCI and Pima Indians Diabetes data set
for performing the following: Multiple Regression

Aim

Multiple regression is like linear regression, but with more than one independent
value, meaning that we try to predict a value based on two or more variables.

Procedure
The Pandas module allows us to read csv files and return a DataFrame object.

Then make a list of the independent values and call this variable X.

Put the dependent values in a variable called y.

From the sklearn module we will use the LinearRegression() method to create a
linear regression object.

This object has a method called fit() that takes the independent and dependent
values as parameters and fills the regression object with data that describes the
relationship.

We have a regression object that are ready to predict age values based on a person
Glucose and BloodPressure

Program
import pandas as pd
from sklearn import linear_model
df = pd.read_csv (r'd:\\diabetes.csv')
print (df)
X = df[['Glucose', 'BloodPressure']]
y = df['Age']
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedage = regr.predict([[150, 13]])
print(predictedage)
Output

[28.77214401]
5 d. Also compare the results of the above analysis for the two data sets.

Aim

In this program, we can compare the results of the two different data sets.

Procedure
Step 1: Prepare the datasets to be compared

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames
Step 3: Compare the values between the two Pandas DataFrames

In this step, you’ll need to import the NumPy package.


Let’s say that you have the following data stored in a CSV file called car1.csv
While you have the data below stored in a second CSV file called car2.csv

Program
import pandas as pd
import numpy as np
data_1 = pd.read_csv(r'd:\car1.csv')
df1 = pd.DataFrame(data_1)
data_2 = pd.read_csv(r'd:\car2.csv')
df2 = pd.DataFrame(data_2)
df1['amount1'] = df2['amount1']
df1['prices_match'] = np.where(df1['amount'] == df2['amount1'], 'True', 'False')
df1['price_diff'] = np.where(df1['amount'] == df2['amount1'], 0, df1['amount'] -
df2['amount1'])
print(df1)

Output
Model City Year amount amount1 prices_match price_diff
0 Maruti Chennai 2022 600000 600000 True 0
1 Hyndai Chennai 2022 700000 700000 True 0
2 Ford Chennai 2022 800000 850000 False -50000
3 Kia Chennai 2022 900000 900000 True 0
4 XL6 Chennai 2022 1000000 1000000 True 0
5 Tata Chennai 2022 1100000 1150000 False -50000
6 Audi Chennai 2022 1200000 1200000 True 0
7 Ertiga Chennai 2022 1300000 1300000 True 0

Please click here to download the Dataset


Dataset 1: car1.csv
Dataset 2: car2.csv
6 a). Apply and explore various plotting functions on UCI data sets. Density
and contour plots

Aim

To apply and explore various plotting functions like Density and contour
plots on datasets.

Procedure

There are three Matplotlib functions that can be helpful for this
task: plt.contour for contour plots, plt.contourf for filled contour plots,
and plt.imshow for showing images

A contour plot can be created with the plt.contour function. It takes three
arguments: a grid of x values, a grid of y values, and a grid of z values.

The x and y values represent positions on the plot, and the z values will be
represented by the contour levels.

Perhaps the most straightforward way to prepare such data is to use


the np.meshgrid function, which builds two-dimensional grids from one-
dimensional arrays.

Next standard line-only contour plot and for color the lines can be color-coded
by specifying a colormap with the cmap argument.

Additionally, we'll add a plt.colorbar() command, which automatically creates an


additional axis with labeled color information for the plot.

Program

%matplotlib inline

import matplotlib.pyplot as plt

plt.style.use('seaborn-white')

import numpy as np

def f(x, y):

return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)

X, Y = np.meshgrid(x, y)

Z = f(X, Y)

plt.contour(X, Y, Z, colors='black');

Output

plt.contour(X, Y, Z, 20, cmap='RdGy');

Output
plt.contourf(X, Y, Z, 20, cmap='RdGy')

plt.colorbar();

Output

Result

Various plotting functions like Density and contour plots on datasets are
successfully executed.
6 b). Apply and explore various plotting functions like Correlation and
scatter plots on UCI data sets

Aim

To apply and explore various plotting functions like Correlation and


scatter plots on datasets.

Procedure

Program

import pandas as pd

con = pd.read_csv('D:/diabetes.csv')

con

list(con.columns)

import seaborn as sns

sns.scatterplot(x="Pregnancies", y="Age", data=con);

Output

sns.lmplot(x="Pregnancies", y="Age", data=con);


Output

sns.lmplot(x="Pregnancies", y="Age", hue="Outcome", data=con);

Output

from scipy import stats

stats.pearsonr(con['Age'], con['Outcome'])

Output
(0.23835598302719774, 2.209975460664566e-11)
cormat = con.corr()

round(cormat,2)

sns.heatmap(cormat);

Output

Result

Various plotting functions like Correlation and scatter plots on datasets

are successfully executed.


6 c. Apply and explore histograms and three dimensional plotting functions on
UCI data sets
Aim
To apply and explore histograms and three dimensional plotting functions
on UCI data sets
Procedure
✓ Download CSV file and upload to explore.
✓ A histogram is basically used to represent data provided in a form of some
groups.
✓ To create a histogram the first step is to create bin of the ranges, then distribute
the whole range of the values into a series of intervals, and count the values
which fall into each of the intervals.
✓ Bins are clearly identified as consecutive, non-overlapping intervals of
variables.The matplotlib.pyplot.hist() function is used to compute and create
histogram of x.
✓ The first one is a standard import statement for plotting using matplotlib,
which you would see for 2D plotting as well.
✓ The second import of the Axes3D class is required for enabling 3D
projections. It is, otherwise, not used anywhere else.

Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # To visualize
from mpl_toolkits.mplot3d import Axes3D
data = pd.read_csv('d:\\diabetes.csv')
data
data['Glucose'].plot(kind='hist')
Output
fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(111, projection='3d')
Output

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = data['Age'].values
y = data['Glucose'].values
z = data['Outcome'].values
ax.set_xlabel("Age (Year)")
ax.set_ylabel("Glucose (Reading)")
ax.set_zlabel("Outcome (0 or 1)")
ax.scatter(x, y, z, c='r', marker='o')
plt.show()
Output

Result
The histograms and three dimensional plotting functions on UCI data sets are
successfully executed.

You might also like