0% found this document useful (0 votes)

38 views42 pages

Data Science Lab Manual

The document outlines the installation and exploration of various Python packages including NumPy, SciPy, Jupyter, Statsmodels, and Pandas, focusing on their features and functionalities. It provides step-by-step procedures for using Jupyter Notebook and performing basic data analytics, integration, and quantitative techniques with these packages. Additionally, it includes source code examples for array operations, integration, and data manipulation using Pandas.

Uploaded by

chitra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views42 pages

Data Science Lab Manual

Uploaded by

chitra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

EX.

NO – 1
DATE:
DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF NUMPY, SCIPY,
JUPYTER, STATSMODELS AND PANDAS PACKAGES

AIM:
To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
PROCEDURE:
Jupyter Notebook is an interactive browser-based platform for scientific computing and is widely
used in data science. In addition to providing an interactive coding platform, Jupyter Notebook
supports both code and text cells. The text cells allow Markdown formatting. So Plain text, images,
LaTex math equations can be used to explain project’s workflow.

For example, the following image shows how can write both Markdown and code by specifying
the cell type.

Markdown and Code Cells in Jupyter Notebook

To run a cell, the Run [▶] button can be pressed or press Shift + Enter to run a cell. The
headings and images are rendered after running the cells.
Jupyter Notebook Cells

Installation Using the Anaconda Distribution

It’s recommended to use the Anaconda distribution of Python. In addition to Python, it comes
with several useful data science packages pre-installed. The installation also includes Jupyter
tools like Jupyter Notebook and JupyterLab.

Steps in this installation.

Step 1: Head over to the official website of Anaconda. Then, navigate

to anaconda.com/products/individual. And download the installer corresponding to your
operating system.
Installing the Anaconda Distribution

Step 2: Now, run the installer. Follow the prompts on your screen to complete the installation.
The installation will typically take a few minutes. ⏳

Launch Jupyter Notebook once the installation process is completed.

Step 3: Once installation is completed, launch Anaconda Navigator. From the navigator, click
on the Launch option in the Jupyter Notebook tab, as shown below:

Launching Jupyter Notebook from Anaconda Navigator

Or the Jupyter Notebook shortcut is used to launch, as illustrated below:

Also launch jupyter notebook from the Anaconda Command Prompt.

NUMPY:
▪ Introduces objects for multidimensional arrays and matrices, as well as functions
that allow to easily perform advanced mathematical and statistical operations on
those objects
▪ provides vectorization of mathematical operations on arrays and matrices which
significantly improves the performance
▪ many other python libraries are built on NumPy
SciPy:
▪ collection of algorithms for linear algebra, differential equations, numerical
integration, optimization, statistics and more
▪ part of SciPy Stack
▪ built on NumPy
Pandas:
▪ adds data structures and tools designed to work with table-like data (similar to
Series and Data Frames in R)
▪ provides tools for data manipulation: reshaping, merging, sorting, slicing,
aggregation etc.
▪ allows handling missing data
Statsmodels:
statsmodels is a Python package that provides a complement to scipy for statistical computations
including descriptive statistics and estimation and inference for statistical models.

RESULT:
EX. NO – 1.a
DATE:
PERFORM BASIC DATA ANALYTICS COMMUNICATION
PROCESS WITH NUMPY
AIM

To write a program for construct and perform the basic data analytics communication
process with numpy and other packages

PROCEDURE
STEP 1: Start
STEP 2: Import the numpy package file for our program.

STEP 3: Perform the Array operation using numpy package

STEP 4: Import the numpy package file for our program.
STEP 5: Perform the Array creating operation using numpy package
STEP 6: Import the numpy package file for our program.

STEP 7: Perform the Array indexing operation using numpy package

STEP 8: Import the numpy package file for our program.
STEP 9: Perform the Operations on single Array using numpy package
STEP 10: Import the numpy package file for our program.
STEP 11: Perform the Unary operation using numpy package
STEP 12: Import the numpy package file for our program.
STEP 13: Perform the Binary operation using numpy package
STEP 14: Import the numpy package file for our program.

STEP 15: Perform the Universal functions (ufunc) using numpy package
STEP 16: Import the numpy package file for our program.
STEP 17: Perform the Sorting array: using numpy package
STEP 18: End
SOURCE CODE

A) ARRAY IN NUMPY
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] )
print(arr)
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT

B) ARRAY CREATION
# array creation techniques

import numpy as np
# Creating array from list with type float
a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)
# Creating array from tuple
b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)
# Creating a 3X4 array with all zeros
c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)
# Create a constant value array of complex type
d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn array initialized with all 6s.""Array type is complex:\n", d)
# Create an array with random values
e = np.random.random((2, 2))
print ("\nA random array:\n", e)
# Create a sequence of integers
# from 0 to 30 with steps of 5
f = np.arange(0, 30, 5)
print ("\nA sequential array with steps of 5:\n", f)
# Create a sequence of 10 values in range 0 to 5
g = np.linspace(0, 5, 10)
print ("\nA sequential array with 10 values between""0 and 5:\n", g)
# Reshaping 3X4 array to 2X2X3 array
arr = np.array([[1, 2, 3, 4], [5, 2, 4, 2], [1, 2, 0, 1]])
newarr = arr.reshape(2, 2, 3)
print ("\nOriginal array:\n", arr)
print ("Reshaped array:\n", newarr)
# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()
print ("\nOriginal array:\n", arr)
print ("Fattened array:\n", flarr)

OUTPUT
C) ARRAY INDEXING
# indexing in numpy
import numpy as np

# An exemplar array
arr = np.array([[-1, 2, 0, 4], [4, -0.5, 6, 0], [2.6, 0, 7, 8], [3, -7, 4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
print ("Array with first 2 rows and alternate columns(0 and 2):\n", temp)
# Integer array indexing example
temp = arr[[0, 1, 2, 3], [3, 2, 1, 0]]
print ("\nElements at indices (0, 3), (1, 2), (2, 1),""(3, 0):\n", temp)
# boolean array indexing example
cond = arr > 0
# cond is a boolean array
temp = arr[cond]
print ("\nElements greater than 0:\n", temp)

OUTPUT
D) OPERATIONS ON SINGLE ARRAY
# basic operations on single array

import numpy as np
a = np.array([1, 2, 5, 3])
# add 1 to every element
print ("Adding 1 to every element:", a+1)
# subtract 3 from each element
print ("Subtracting 3 from each element:", a-3)
# multiply each element by 10
print ("Multiplying each element by 10:", a*10)
# square each element
print ("Squaring each element:", a**2)
# modify existing array
a *= 2
print ("Doubled each element of original array:", a)
# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)

OUTPUT

E) UNARY OPERATORS
# unary operators in numpy
import numpy as np
arr = np.array([[1, 5, 6], [4, 7, 2], [3, 1, 9]])
# maximum element of array
print ("Largest element is:", arr.max())
print ("Row-wise maximum elements:", arr.max(axis = 1))
# minimum element of array
print ("Column-wise minimum elements:", arr.min(axis = 0))
# sum of array elements
print ("Sum of all array elements:", arr.sum())
# cumulative sum along each row
print ("Cumulative sum along each row:\n", arr.cumsum(axis = 1))

OUTPUT

F) BINARY OPERATORS
# binary operators in Numpy
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[4, 3], [2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# multiply arrays (elementwise multiplication)
print ("Array multiplication:\n", a*b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))

OUTPUT
G) UNIVERSAL FUNCTIONS (ufunc)
# universal functions in numpy
import numpy as np
# create an array of sine values
a = np.array([0, np.pi/2, np.pi])
print ("Sine values of array elements:", np.sin(a))
# exponential values
a = np.array([0, 1, 2, 3])
print ("Exponent of array elements:", np.exp(a))
# square root of array values
print ("Square root of array elements:", np.sqrt(a))

OUTPUT

H) SORTING ARRAY
# Python program to demonstrate sorting in numpy

import numpy as np
a = np.array([[1, 4, 2], [3, 4, 6], [0, -1, 5]])
# sorted array
print ("Array elements in sorted order:\n", np.sort(a, axis = None))
# sort array row-wise
print ("Row-wise sorted array:\n", np.sort(a, axis = 1))
# specify sort algorithm
print ("Column wise sort by applying mergesort:\n", np.sort(a, axis = 0, kind =
'mergesort'))
# Example to show sorting of structured array
# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), ('Pankaj', 2008,
7.9), ('Aakash', 2009, 9.0)]
# Creating array
arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n", np.sort(arr, order = 'name'))
print ("Array sorted by graduation year and then cgpa:\n", np.sort(arr, order =
['grad_year', 'cgpa']))

OUTPUT

RESULT

EX. NO – 1.b

DATE:

PERFORM INTEGRATION USING SCIPY SUB PACKAGE

Aim:
To write a program to construct and demonstrate the integration
operations using Scipy Sub packages.
PROCEDURE
STEP 1: Start
STEP 2: Import the scipy package file for our program.
STEP 3: Perform the Integration operations
STEP 4: Perform the Single Integration operation
STEP 5: Perform the double integration operations
STEP 6: End
SOURCE CODE:
Single Integration:

from scipy import integrate

# take f(x) function as f
f = lambda x : (x**2)/2
#single integration with a = 0 & b = 1
integration = integrate.quad(f, 0 , 1)
print(integration)

OUTPUT:

Double Integration:

from scipy import integrate

import numpy as np
#import square root function from math lib
from math import sqrt
# set fuction f(x)
f = lambda x, y : 64 *x*y
# lower limit of second integral
p = lambda x : 0
# upper limit of first integral
q = lambda y : sqrt(1 - 2*y**2)
# perform double integration
integration = integrate.dblquad(f , 0 , 2/4, p, q)
print(integration)

OUTPUT:

RESULT:

EX. NO – 1.c

DATE:
APPLY QUANTITATIVE TECHNIQUE USING
APPROPRIATE PACKAGES IN PYTHON

AIM:
To write a program to construct and demonstrate the
Quantitative
Techniques Using Appropriate Packages in Python.
PROCEDURE

STEP 1: Start
STEP 2: Import the pandas package file for our program.
STEP 3: Download iris_csv.csv file and save it in system.
STEP 4: Read the data(.csv) file
STEP 5: Perform the shape function
STEP 6: Perform the info() function
STEP 7: Perform the head() function
STEP 8: Perform the tail() function
STEP 9: Perform the mean() function
STEP 10: Perform the median() function
STEP 11: Perform the min() function
STEP 12: Perform the max() function
STEP 13: Perform the count() function
STEP 14: Perform the std() function
STEP 15: Perform the corr() function
STEP 16: Perform the describe() function
STEP 17: End

SOURCE CODE:

import numpy as np
import pandas as pd
df = pd.read_csv("C:/Users/mani4/Documents/Python
Scripts/iris_csv.csv")

# Prints number of rows and columns in

dataframe
df.shape
# Index, Datatype and Memory
information
df.info()
# Prints first n rows of the
DataFrame df.head()
# Prints last n rows of the
DataFrame
df.tail()
# Returns the mean of all
columns df.mean()

# Returns the median of each column

df.median()

# Returns the lowest value in each

column df.min()
# Returns the highest value in each column
df.max()

# Returns the number of non-null values in each DataFrame column

df.count()
# Returns the standard deviation of each
column df.std()

# Returns the correlation between columns in a DataFrame

df.corr()
# Summary statistics for numerical
columns df.describe()
RESULT:
Ex2. Working with Numpy arrays
Aim:
Working with different instructions used in Numpy array
Procedure
✓ Do the following function
✓ To print the array print(a)
✓ To print the shape of the array print(a.shape)
✓ To the function return the number of dimensions of an array.
print(a.ndim)
✓ This data type object (dtype) informs us about the layout of the array.
print(a.dtype.name)
✓ This returns the size (in bytes) of each element of a NumPy array
print(a.itemsize)
✓ This is the total number of elements in the ndarray. print(a.size)

Program
import numpy as np
a = np.arange(15).reshape(3, 5)
To print the array
print(a)
To print the shape of the array
print(a.shape)
To the function return the number of dimensions of an array.
print(a.ndim)
This data type object (dtype) informs us about the layout of the array
print(a.dtype.name)
This returns the size (in bytes) of each element of a NumPy array
print(a.itemsize)
This is the total number of elements in the ndarray.
print(a.size)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 1)
print(x)
Slice elements from index 1 to index 5 from the following array
print(arr[1:5])
Get third and fourth elements from the following array and add them.
print(arr[2] + arr[3])
Write a NumPy program to convert the values of Centigrade degrees into
Fahrenheit degrees and vice versa. Values are stored into a NumPy array.
import numpy as np
fvalues = [0, 12, 45.21, 34, 99.91, 32]
F = np.array(fvalues)
print("Values in Fahrenheit degrees:")
print(F)
print("Values in Centigrade degrees:")
print(np.round((5*F/9 - 5*32/9),2))
Ex3 : Working with Pandas data frames
Aim
To work with Pandas data frames
Procedure
1. Create a dataset with name, city, age and pyscore.
2. Use .head() to show the first few items and .tail() to show the last few
items.
3. Get the DataFrame’s row labels with .index and its column labels
with .columns
4. Get the data types for each column of a Pandas DataFrame with .dtypes
5. Check the amount of memory used by each column.
6. Accessor .loc[], which you can use to get rows or columns by their labels,
Pandas offers the accessor .iloc[], which retrieves a row or column by its
integer index.
7. Creating a new Series object that represents this new candidate
Program
import pandas as pd
data = {
'name': ['Muthu', 'Anand', 'Ramkumar', 'Roja', 'Robin', 'Rajan', 'Joel'],
'city': ['Chennai', 'Madurai', 'Tirunelveli', 'Saidapet','Tambaram', 'Irukattukott
ai', 'Central Station'],
'age': [41, 28, 33, 34, 38, 31, 37],
'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
}
row_labels = [101, 102, 103, 104, 105, 106, 107]

df is a variable that holds the reference to your Pandas DataFrame

df = pd.DataFrame(data=data, index=row_labels)
df

We can use .head() to show the first few items and .tail() to show the last few
items.
df.head(n=2)
df.tail(n=2)
You can get the DataFrame’s row labels with .index and its column labels
with .columns
df.index
df.columns

We can get the data types for each column of a Pandas DataFrame with .dtypes
df.dtypes

You can even check the amount of memory used by each column
with .memory_usage()
df.memory_usage()

In addition to the accessor .loc[], which you can use to get rows or columns by
their labels, Pandas offers the accessor .iloc[], which retrieves a row or column
by its integer index.
df.loc[101]
df.iloc[0]
We can start by creating a new Series object that represents this new candidate
john = pd.Series(data=['Jovan', 'Medavakkam', 34, 79],index=df.columns, name
=17)
john
df = df.append(john)
df
Ex 4a. Reading data from text files
Aim
Write a python program to Reading data from text files
Procedure
✓ Type the text file with some data and save as filename.txt.
✓ Load the text file using the open() function with write command.
✓ Do the functions like read(), write() commands as needed.
Program
# Program to show various ways to read and
# write data in a file.
file1 = open("myfile.txt","w")
L = ["This is Delhi \n","This is Paris \n","This is London \n"]

# \n is placed to indicate EOL (End of Line)

file1.write("Hello \n")
file1.writelines(L)
file1.close() #to change file access modes

file1 = open("myfile.txt","r+")

print("Output of Read function is ")

print(file1.read())
print()

# seek(n) takes the file handle to the nth

# bite from the beginning.
file1.seek(0)
print( "Output of Readline function is ")
print(file1.readline())
print()

file1.seek(0)

# To show difference between read and readline

print("Output of Read(9) function is ")
print(file1.read(9))
print()

file1.seek(0)

print("Output of Readline(9) function is ")

print(file1.readline(9))

file1.seek(0)
# readlines function
print("Output of Readlines function is ")
print(file1.readlines())
print()
file1.close()
Ex 4b. Reading data from excel files does exploring various commands
Aim
Write a python program to Reading data from excel files
Procedure
• First of all create an excel file with some 10 records(10 rows) and 5
columns with numerical data and save them as filename.xlsx format.
• Reading data from excel files into pandas using Python.
• Exploring the data from excel files in Pandas.
• Using functions to manipulate and reshape the data in Pandas.
To view 5 columns from the top and from the bottom of the data.
The shape() method can be used to view the number of rows and columns.
If any column contains numerical data, we can sort that column using
the sort_values() method in pandas.
Suppose our data is mostly numerical. We can get the statistical information like
mean, max, min, etc. about the data frame using the describe() method.

Program
import pandas as pd
data = pd.read_excel (r'd:\mark.xlsx')
data
df = pd.DataFrame(data, columns= ['Name','CGPA'])
print (df)
data.head()
data.tail()
data.shape
sorted_column = data.sort_values(['Name'], ascending = False)
sorted_column['Name'].head(5)
data.describe()
Ex 4c. Exploring various commands for doing descriptive analytics on the
Iris data set.

Aim

To explore various commands for doing descriptive analytics on the Iris data
set.

Procedure
✓ To understand idea behind Descriptive Statistics.
✓ Load the packages we will need and also the `iris` dataset.
✓ load_iris() loads in an object containing the iris dataset, which I stored in
`iris_obj`.
✓ Basic statistics.
✓ This number is the number of rows in the dataset, and can be obtained
via `count()`.
✓ Mean for every numeric column
✓ Median for every numeric column
✓ variance is a measure of dispersion, roughly the “average” squared
distance of a data point from the mean.
✓ The standard deviation is the square root of the variance and interpreted
as the “average” distance a data point is from the mean.
✓ The maximum and minimum values.

Program Code
import pandas as pd
from pandas import DataFrame
from sklearn.datasets import load_iris
# sklearn.datasetsincludes common example datasets
# A function to load in the iris dataset
iris_obj = load_iris()
# Dataset preview
iris_obj.data
iris = DataFrame(iris_obj.data, columns=iris_obj.feature_nam
es,index=pd.Index([i for i in range(iris_obj.data.shape[0])]
)).join(DataFrame(iris_obj.target, columns=pd.Index(["specie
s"]), index=pd.Index([i for i in range(iris_obj.target.shape
[0])])))
iris # prints iris data

Commands
iris_obj.feature_names
iris.count()
iris.mean()
iris.median()
iris.var()
iris.std()
iris.max()
iris.min()
iris.describe()

Result
Exploring various commands for doing descriptive analytics on the Iris data set
successfully executed.
5. a. Use the diabetes data set from UCI and Pima Indians
Diabetes data set for performing the following:
Univariate analysis: Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis.

Aim:
Analysis the various univariate functions like Frequency, Mean, Median,
Mode, Variance, Standard Deviation, Skewness and Kurtosis on dataset
like Pima Indian diabetes dataset.

Procedure
• Download dataset like Pima Indian diabetes dataset. Save them in
any drive and call them for process.
• The mean() function can be used to calculate mean/average of a
given list of numbers.
• The median() method calculates the median (middle value) of the
given data set.
• The mode of a set of data values is the value that appears most
often.
• The var() method calculates the variance for each column.
• Standard deviation std() is a number that describes how spread out
the values are.
• The skew() method calculates the skew for each column. Skewness
refers to a distortion or asymmetry that deviates from the
symmetrical bell curve, or normal distribution, in a set of data.
• Kurtosis:
• It is also a statistical term and an important characteristic of
frequency distribution. It determines whether a distribution is
heavy-tailed in respect of the normal distribution. It provides
information about the shape of a frequency distribution.
Program:
import pandas as pd
from scipy.stats import kurtosis
import pylab as p
df = pd.read_csv (r'd:\\diabetes.csv')
print (df)
df1 = pd.DataFrame(df, columns= ['Age','Glucose'])
print (df1)
df1.mean()
df1.median()
df1.mode()
print(df1.var())
df1.std()
print(df1.skew())
print(kurtosis(df, axis=0, bias=True))

Dataset download link

https://github.com/npradaschnor/Pima-Indians-Diabetes-
Dataset/blob/master/Pima%20Indians%20Diabetes%20Dataset.ipynb

Result:
The various univariate functions like Frequency, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis on dataset Pima
Indian diabetes dataset are successfully executed.
5 b. Linear Regression and Logistic Regression with the Diabetes Dataset
Using Python Machine Learning
Aim
In this experiment we use the diabetes dataset from sklearn and then we need to
implement the Linear Regression over this:

Procedure
Load sklearn Libraries.
Load Data
Load the diabetes dataset
Split Dataset
Creating Model Linear Regression and Logistic Regression
Make predictions using the testing set
Finding Coefficient And Mean Square Error

Program
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

#To calculate accuracy measures and confusion matrix

from sklearn import metrics

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets

diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]

# Create linear regression object

regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set

diabetes_y_pred = regr.predict(diabetes_X_test)

# Create Logistic regression object

Logistic_model = LogisticRegression()
Logistic_model.fit(diabetes_X_train, diabetes_y_train)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(diabetes_y_test, diabetes_y_pred))

y_predict = Logistic_model.predict(diabetes_X_train)
#print("Y predict/hat ", y_predict)
y_predict

Output
Coefficients:
[938.23786125]
Mean squared error: 2548.07
Coefficient of determination: 0.47
5 c. Use the diabetes data set from UCI and Pima Indians Diabetes data set
for performing the following: Multiple Regression

Aim

Multiple regression is like linear regression, but with more than one independent
value, meaning that we try to predict a value based on two or more variables.

Procedure
The Pandas module allows us to read csv files and return a DataFrame object.

Then make a list of the independent values and call this variable X.

Put the dependent values in a variable called y.

From the sklearn module we will use the LinearRegression() method to create a
linear regression object.

This object has a method called fit() that takes the independent and dependent
values as parameters and fills the regression object with data that describes the
relationship.

We have a regression object that are ready to predict age values based on a person
Glucose and BloodPressure

Program
import pandas as pd
from sklearn import linear_model
df = pd.read_csv (r'd:\\diabetes.csv')
print (df)
X = df[['Glucose', 'BloodPressure']]
y = df['Age']
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedage = regr.predict([[150, 13]])
print(predictedage)
Output

[28.77214401]
5 d. Also compare the results of the above analysis for the two data sets.

Aim

In this program, we can compare the results of the two different data sets.

Procedure
Step 1: Prepare the datasets to be compared

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames
Step 3: Compare the values between the two Pandas DataFrames

In this step, you’ll need to import the NumPy package.

Let’s say that you have the following data stored in a CSV file called car1.csv
While you have the data below stored in a second CSV file called car2.csv

Program
import pandas as pd
import numpy as np
data_1 = pd.read_csv(r'd:\car1.csv')
df1 = pd.DataFrame(data_1)
data_2 = pd.read_csv(r'd:\car2.csv')
df2 = pd.DataFrame(data_2)
df1['amount1'] = df2['amount1']
df1['prices_match'] = np.where(df1['amount'] == df2['amount1'], 'True', 'False')
df1['price_diff'] = np.where(df1['amount'] == df2['amount1'], 0, df1['amount'] -
df2['amount1'])
print(df1)

Output
Model City Year amount amount1 prices_match price_diff
0 Maruti Chennai 2022 600000 600000 True 0
1 Hyndai Chennai 2022 700000 700000 True 0
2 Ford Chennai 2022 800000 850000 False -50000
3 Kia Chennai 2022 900000 900000 True 0
4 XL6 Chennai 2022 1000000 1000000 True 0
5 Tata Chennai 2022 1100000 1150000 False -50000
6 Audi Chennai 2022 1200000 1200000 True 0
7 Ertiga Chennai 2022 1300000 1300000 True 0

Please click here to download the Dataset

Dataset 1: car1.csv
Dataset 2: car2.csv
6 a). Apply and explore various plotting functions on UCI data sets. Density
and contour plots

Aim

To apply and explore various plotting functions like Density and contour
plots on datasets.

Procedure

There are three Matplotlib functions that can be helpful for this
task: plt.contour for contour plots, plt.contourf for filled contour plots,
and plt.imshow for showing images

A contour plot can be created with the plt.contour function. It takes three
arguments: a grid of x values, a grid of y values, and a grid of z values.

The x and y values represent positions on the plot, and the z values will be
represented by the contour levels.

Perhaps the most straightforward way to prepare such data is to use

the np.meshgrid function, which builds two-dimensional grids from one-
dimensional arrays.

Next standard line-only contour plot and for color the lines can be color-coded
by specifying a colormap with the cmap argument.

Additionally, we'll add a plt.colorbar() command, which automatically creates an

additional axis with labeled color information for the plot.

Program

%matplotlib inline

import matplotlib.pyplot as plt

plt.style.use('seaborn-white')

import numpy as np

def f(x, y):

return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)

X, Y = np.meshgrid(x, y)

Z = f(X, Y)

plt.contour(X, Y, Z, colors='black');

Output

plt.contour(X, Y, Z, 20, cmap='RdGy');

Output
plt.contourf(X, Y, Z, 20, cmap='RdGy')

plt.colorbar();

Output

Result

Various plotting functions like Density and contour plots on datasets are
successfully executed.
6 b). Apply and explore various plotting functions like Correlation and
scatter plots on UCI data sets

Aim

To apply and explore various plotting functions like Correlation and

scatter plots on datasets.

Procedure

Program

import pandas as pd

con = pd.read_csv('D:/diabetes.csv')

con

list(con.columns)

import seaborn as sns

sns.scatterplot(x="Pregnancies", y="Age", data=con);

Output

sns.lmplot(x="Pregnancies", y="Age", data=con);

Output

sns.lmplot(x="Pregnancies", y="Age", hue="Outcome", data=con);

Output

from scipy import stats

stats.pearsonr(con['Age'], con['Outcome'])

Output
(0.23835598302719774, 2.209975460664566e-11)
cormat = con.corr()

round(cormat,2)

sns.heatmap(cormat);

Output

Result

Various plotting functions like Correlation and scatter plots on datasets

are successfully executed.

6 c. Apply and explore histograms and three dimensional plotting functions on
UCI data sets
Aim
To apply and explore histograms and three dimensional plotting functions
on UCI data sets
Procedure
✓ Download CSV file and upload to explore.
✓ A histogram is basically used to represent data provided in a form of some
groups.
✓ To create a histogram the first step is to create bin of the ranges, then distribute
the whole range of the values into a series of intervals, and count the values
which fall into each of the intervals.
✓ Bins are clearly identified as consecutive, non-overlapping intervals of
variables.The matplotlib.pyplot.hist() function is used to compute and create
histogram of x.
✓ The first one is a standard import statement for plotting using matplotlib,
which you would see for 2D plotting as well.
✓ The second import of the Axes3D class is required for enabling 3D
projections. It is, otherwise, not used anywhere else.

Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # To visualize
from mpl_toolkits.mplot3d import Axes3D
data = pd.read_csv('d:\\diabetes.csv')
data
data['Glucose'].plot(kind='hist')
Output
fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(111, projection='3d')
Output

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = data['Age'].values
y = data['Glucose'].values
z = data['Outcome'].values
ax.set_xlabel("Age (Year)")
ax.set_ylabel("Glucose (Reading)")
ax.set_zlabel("Outcome (0 or 1)")
ax.scatter(x, y, z, c='r', marker='o')
plt.show()
Output

Result
The histograms and three dimensional plotting functions on UCI data sets are
successfully executed.

Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
NumPy Vs PythonLists
No ratings yet
NumPy Vs PythonLists
14 pages
UNIT II - Data Handling Part I
No ratings yet
UNIT II - Data Handling Part I
8 pages
Numpy
No ratings yet
Numpy
7 pages
C1 W1 Lab 1 Introduction To Numpy Arrays
No ratings yet
C1 W1 Lab 1 Introduction To Numpy Arrays
12 pages
Python 2.1.1
No ratings yet
Python 2.1.1
7 pages
NumPy Basics and Operations Guide
No ratings yet
NumPy Basics and Operations Guide
53 pages
Unit II - Final
No ratings yet
Unit II - Final
37 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Lect-07 and 08, Week-02
No ratings yet
Lect-07 and 08, Week-02
31 pages
15 Numpy
No ratings yet
15 Numpy
32 pages
NumPy Guide for Python Beginners
No ratings yet
NumPy Guide for Python Beginners
67 pages
NumpyToday's Session
No ratings yet
NumpyToday's Session
8 pages
Unit 1
No ratings yet
Unit 1
170 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
NumPy Basics and Array Operations
No ratings yet
NumPy Basics and Array Operations
73 pages
Ex 1
No ratings yet
Ex 1
6 pages
Basic of Numphy
No ratings yet
Basic of Numphy
14 pages
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
Python File Semester-4
No ratings yet
Python File Semester-4
42 pages
Fds Labmanual
No ratings yet
Fds Labmanual
57 pages
Day 8 NumPy For Data Science Part 1
No ratings yet
Day 8 NumPy For Data Science Part 1
16 pages
CAP776 Numpy
No ratings yet
CAP776 Numpy
71 pages
Unit II - Notes
No ratings yet
Unit II - Notes
10 pages
Lab 1
No ratings yet
Lab 1
6 pages
Numpy
No ratings yet
Numpy
71 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Python Array and NumPy Guide
No ratings yet
Python Array and NumPy Guide
62 pages
Pendahuluan Python
No ratings yet
Pendahuluan Python
29 pages
Numpy
No ratings yet
Numpy
51 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
Python Numpy
No ratings yet
Python Numpy
48 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
NumPy Essentials for Data Scientists
100% (1)
NumPy Essentials for Data Scientists
27 pages
Data Science Lab: Numpy: Numerical Python
No ratings yet
Data Science Lab: Numpy: Numerical Python
71 pages
NumPy in Python
No ratings yet
NumPy in Python
4 pages
Module 3.2.5
No ratings yet
Module 3.2.5
21 pages
Wa0006.
No ratings yet
Wa0006.
14 pages
Chapter 3 Arrays
No ratings yet
Chapter 3 Arrays
18 pages
What Is NumPy
No ratings yet
What Is NumPy
5 pages
Machine Learning - Section #3 (Numpy)
No ratings yet
Machine Learning - Section #3 (Numpy)
21 pages
Data Science Lab - Ii - Cse - CS3361
No ratings yet
Data Science Lab - Ii - Cse - CS3361
55 pages
Python Sem V Portion 2
No ratings yet
Python Sem V Portion 2
29 pages
13 - NumPy
No ratings yet
13 - NumPy
46 pages
Numpy
No ratings yet
Numpy
9 pages
Day 1 Work PYTHON
No ratings yet
Day 1 Work PYTHON
3 pages
Python NumPy for Developers
No ratings yet
Python NumPy for Developers
43 pages
Numpy
No ratings yet
Numpy
20 pages
Introduction To NUMPY Lecture Note
No ratings yet
Introduction To NUMPY Lecture Note
12 pages
Numpy
No ratings yet
Numpy
4 pages
Power Meter Training Plan
No ratings yet
Power Meter Training Plan
5 pages
DLL Science WK1 Q4 2024
No ratings yet
DLL Science WK1 Q4 2024
5 pages
GlobalEnglish 3 TB
0% (1)
GlobalEnglish 3 TB
218 pages
Annexure IV
No ratings yet
Annexure IV
2 pages
Orta Sevi̇yede İngi̇li̇zce Bi̇len Ana Di̇li̇ Türkçe Olan Öğrenci̇leri̇n Vücut
No ratings yet
Orta Sevi̇yede İngi̇li̇zce Bi̇len Ana Di̇li̇ Türkçe Olan Öğrenci̇leri̇n Vücut
163 pages
CET415 - M2 - Ktunotes - in
No ratings yet
CET415 - M2 - Ktunotes - in
78 pages
Tales from the Rabbi's Desk 2
No ratings yet
Tales from the Rabbi's Desk 2
11 pages
Soft Drink Industry Profitability Analysis
No ratings yet
Soft Drink Industry Profitability Analysis
3 pages
Vendor/Contractor Data Form
No ratings yet
Vendor/Contractor Data Form
6 pages
Pamflet CPHI 2018
No ratings yet
Pamflet CPHI 2018
1 page
Princeton Chromatography SFC & HPLC Solutions
No ratings yet
Princeton Chromatography SFC & HPLC Solutions
20 pages
HV Ref Manual PDF
No ratings yet
HV Ref Manual PDF
184 pages
Handling Computer Files (Teachers Guide)
No ratings yet
Handling Computer Files (Teachers Guide)
16 pages
Basic Karma Consciousness Healing Basic Practitioner Manual
100% (10)
Basic Karma Consciousness Healing Basic Practitioner Manual
71 pages
David Whyte Essentials (David Whyte) (Z-Library)
100% (1)
David Whyte Essentials (David Whyte) (Z-Library)
104 pages
Beach Please
No ratings yet
Beach Please
1 page
5 Amazing Tech Projects To DO
No ratings yet
5 Amazing Tech Projects To DO
1 page
Mineral Resources of West Bengal
No ratings yet
Mineral Resources of West Bengal
22 pages
Melancholy in The Medieval World The Christian, Jewish, and Muslim Traditions
No ratings yet
Melancholy in The Medieval World The Christian, Jewish, and Muslim Traditions
19 pages
Vectors Notes
No ratings yet
Vectors Notes
13 pages
Coupling Catalog
No ratings yet
Coupling Catalog
84 pages
MANM519 - Week 3 AI Jobs and Future of Work - Lecture Notes
No ratings yet
MANM519 - Week 3 AI Jobs and Future of Work - Lecture Notes
12 pages
Living Conditions During The Industrial Revolution
No ratings yet
Living Conditions During The Industrial Revolution
7 pages
Data Analytics
100% (3)
Data Analytics
190 pages
History 2ND Year
100% (1)
History 2ND Year
4 pages
Ramesh Chopra - Electronics Projects Volume 20 (2013, EFY Enterprises Pvt. LTD.)
100% (8)
Ramesh Chopra - Electronics Projects Volume 20 (2013, EFY Enterprises Pvt. LTD.)
200 pages
Chapter Two and References - 043431
No ratings yet
Chapter Two and References - 043431
9 pages
Windows Client Setup Guide
No ratings yet
Windows Client Setup Guide
13 pages
Account Statement: Penyata Akaun
No ratings yet
Account Statement: Penyata Akaun
2 pages
English Class 6 Set 2
No ratings yet
English Class 6 Set 2
3 pages

Data Science Lab Manual

Uploaded by

Data Science Lab Manual

Uploaded by

EX.

Markdown and Code Cells in Jupyter Notebook

Installation Using the Anaconda Distribution

Steps in this installation.

Step 1: Head over to the official website of Anaconda. Then, navigate

Launch Jupyter Notebook once the installation process is completed.

Launching Jupyter Notebook from Anaconda Navigator

Or the Jupyter Notebook shortcut is used to launch, as illustrated below:

STEP 3: Perform the Array operation using numpy package

STEP 7: Perform the Array indexing operation using numpy package

PERFORM INTEGRATION USING SCIPY SUB PACKAGE

from scipy import integrate

from scipy import integrate

# Prints number of rows and columns in

# Returns the median of each column

# Returns the lowest value in each

# Returns the number of non-null values in each DataFrame column

# Returns the correlation between columns in a DataFrame

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

df is a variable that holds the reference to your Pandas DataFrame

# \n is placed to indicate EOL (End of Line)

print("Output of Read function is ")

# seek(n) takes the file handle to the nth

# To show difference between read and readline

print("Output of Readline(9) function is ")

Dataset download link

#To calculate accuracy measures and confusion matrix

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets

# Split the targets into training/testing sets

# Create linear regression object

# Make predictions using the testing set

# Create Logistic regression object

Put the dependent values in a variable called y.

Step 2: Create the two DataFrames

In this step, you’ll need to import the NumPy package.

Please click here to download the Dataset

Perhaps the most straightforward way to prepare such data is to use

Additionally, we'll add a plt.colorbar() command, which automatically creates an

import matplotlib.pyplot as plt

def f(x, y):

return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

plt.contour(X, Y, Z, 20, cmap='RdGy');

To apply and explore various plotting functions like Correlation and

import seaborn as sns

sns.scatterplot(x="Pregnancies", y="Age", data=con);

sns.lmplot(x="Pregnancies", y="Age", data=con);

sns.lmplot(x="Pregnancies", y="Age", hue="Outcome", data=con);

from scipy import stats

Various plotting functions like Correlation and scatter plots on datasets

are successfully executed.

You might also like