KEMBAR78
Python MP Report PDF | PDF | Machine Learning | Cross Validation (Statistics)
0% found this document useful (0 votes)
239 views61 pages

Python MP Report PDF

The document provides code examples for various data analytics concepts in Python including: 1) Linear search and insertion sort algorithms. Code samples demonstrate searching and sorting lists. 2) Object oriented programming concepts like inheritance, encapsulation, and method overloading. Classes and methods are defined. 3) Dataframe manipulation in Python using Pandas and NumPy. Examples load and clean datasets, manipulate frames, and split/concatenate arrays. 4) Array manipulation, searching, sorting, and splitting using NumPy. Functions create, access, modify, and split arrays. The document contains Python code snippets with explanations to demonstrate fundamental data analytics techniques.

Uploaded by

B A Siddartha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
239 views61 pages

Python MP Report PDF

The document provides code examples for various data analytics concepts in Python including: 1) Linear search and insertion sort algorithms. Code samples demonstrate searching and sorting lists. 2) Object oriented programming concepts like inheritance, encapsulation, and method overloading. Classes and methods are defined. 3) Dataframe manipulation in Python using Pandas and NumPy. Examples load and clean datasets, manipulate frames, and split/concatenate arrays. 4) Array manipulation, searching, sorting, and splitting using NumPy. Functions create, access, modify, and split arrays. The document contains Python code snippets with explanations to demonstrate fundamental data analytics techniques.

Uploaded by

B A Siddartha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

22MCAL36: Data Analytics Lab with Mini-Project

PART - A

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page1


22MCAL36: Data Analytics Lab with Mini-Project

1.Write a Python program to perform linear search


list1 = []
n = int(input("Enter the number of elements to be entered"))
for i in range(0, n):
ele=int(input())
list1.append(ele)
print(list1)
def linear_Search(list1, n, key):
# Searching list1 sequentially
for i in range(0, n):
if (list1[i] == key):
return i
return -1
key = int(input("Enter the number of elements to be searched"))
#list1 = [1 ,3, 5, 4, 7, 9]
#key = 3
n = len(list1)
res = linear_Search(list1, n, key)
if(res == -1):
print("Element not found")
else:
print("Element found at index: ", res)

Output :
Enter the number of elements to be entered 4
5
7
3
9

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page2


22MCAL36: Data Analytics Lab with Mini-Project

Enter the number of elements to be searched 3


Element found at index: 2

Enter the number of elements to be entered 4


5
7
3
9
Enter the number of elements to be searched 2
Element not found

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page3


22MCAL36: Data Analytics Lab with Mini-Project

2.Write a Python program to insert an element into a sorted list


1st Method

import bisect
myList = []
n = int(input("Enter the number of elements to be entered"))
print("Enter Elements of List")
for i in range(0, n):
ele=int(input())
myList.append(ele)
print(myList)
print("Original list is:", myList)
sorted_list = []
for i in myList:
position = bisect.bisect(sorted_list, i)
bisect.insort(sorted_list, i)
print("Sorted List:")
print(sorted_list)
element=int(input("enter the element to be inserted"))
print("The element to be inserted is:", element)
bisect.insort(myList, element)
print("The updated list is:", myList)

Output:
Enter the number of elements to be entered 5
Enter Elements of List
1
3
5
2

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page4


22MCAL36: Data Analytics Lab with Mini-Project

6
[1, 3, 5, 2, 6]
Original list is: [1, 3, 5, 2, 6]
Sorted List:
[1, 2, 3, 5, 6]
enter the element to be inserted

2nd Method
Using The insert() Method

myList = [1, 2, 3, 5, 6, 7, 8, 9, 10]


print("Original list is:", myList)
element = 4
print("The element to be inserted is:", element)
l = len(myList)
index = 0
for i in range(l):
if myList[i] > element:
index = i
break
myList.insert(index, element)
print("The updated list is:", myList)

Output:
Original list is: [1, 2, 3, 5, 6, 7, 8, 9, 10]
The element to be inserted is: 4
The updated list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

3rd Method
import bisect
myList = [1, 2, 3, 5, 6, 7, 8, 9, 10]
print("Original list is:", myList)
element = 4

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page5


22MCAL36: Data Analytics Lab with Mini-Project

print("The element to be inserted is:", element)


bisect.insort(myList, element)
print("The updated list is:", myList)

Output:
Original list is: [1, 2, 3, 5, 6, 7, 8, 9, 10]
The element to be inserted is: 4
The updated list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page6


22MCAL36: Data Analytics Lab with Mini-Project

3.Write a python program using object oriented programming to


demonstrate encapsulation, overloading and inheritance
# Demonstration of Inheritance
# Parent Class
print('\n')
class ParentClass:
def __init__(self):
print("Deomonstration of Iheritance ")
print("================================")
def PClassMethod(self):
print("Its parent class method")
#Child Class
class ChildClass(ParentClass):
def ChildClassMethod(self):
print("Its Child class method")
Ob = ChildClass()
Ob.ChildClassMethod()
Ob.PClassMethod()

# Deomonstration of Encapsulation
class Car:
print("Deomonstration of Encapsulation ")
print("================================")
def __init__(self):
self.__maxprice = 1000000
def sell(self):
print("Selling Price: {}".format(self.__maxprice))
def setMaxPrice(self, price):
self.__maxprice = price
c = Car()
c.sell()
# change the price
c.__maxprice = 500000
c.sell()
# using setter function
c.setMaxPrice(200000)
c.sell()

# Demonstration of Overloading (Polymorphism)


print('\n')
class MethodOverload():
print("Deomonstration of Overloading ")

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page7


22MCAL36: Data Analytics Lab with Mini-Project

print("================================")
def add(self, a=None, b=None):
print(f'The value of A is {a}')
print(f'The value of B is {b}')
ob=MethodOverload() # Object creation of class MethodOverload()
ob.add()
ob.add(2)
ob.add(2,3)

Output:
Deomonstration of Iheritance
================================
Its Child class method
Its parent class method
Deomonstration of Encapsulation
================================
Selling Price: 1000000
Selling Price: 1000000
Selling Price: 200000
Deomonstration of Overloading
================================
The value of A is None
The value of B is None
The value of A is 2
The value of B is None
The value of A is 2
The value of B is 3

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page8


22MCAL36: Data Analytics Lab with Mini-Project

4.Implement a python program to demonstrate


1)Importing Datasets
2) Cleaning the Data
3) Data frame manipulation using Numpy

# 1. Loading the data


import pandas as pd
data = pd.read_csv('iris.data',header=None)
data.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']
data.head()
data = pd.DataFrame({'Country': ['India','Nepal','Pakistan','Bangladesh','Bhutan'],
'Rank':[11,40,100,130,101]})
data
data.describe()
data.info()

# 2. Cleaning the data


data.isnull()
data.isna()
data.isna().any()
data.isna().sum()
data.isna().any().sum()
# 3. Data frame manipulation using Numpy
import numpy as np
np.__version__
L = list(range(21))
L
[str(c) for c in L]
[type(item) for item in L]

np.zeros(10, dtype='int')
np.ones((3,8), dtype=float)
np.full((3,5),1.23)
np.arange(0, 20, 2)

x1 = np.array([4, 3, 4, 4, 8, 4])
x1
x1[0]
x1[-1]
x1[-2]

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page9


22MCAL36: Data Analytics Lab with Mini-Project

x = np.arange(20)
x
x[:5]
x[4:]
x[4:7]
x[ : : 2]
x[1::2]
x[::-1]
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]
np.concatenate([x, y,z])
x = np.arange(10)
x
x1,x2,x3 = np.split(x,[3,6])
print(x1,x2,x3)

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page10


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page11


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page12


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page13


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page14


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page15


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page16


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page17


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page18


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page19


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page20


22MCAL36: Data Analytics Lab with Mini-Project

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page21


22MCAL36: Data Analytics Lab with Mini-Project

5. Implement a python program to demonstrate the following using


NumPy
A) Array manipulation, Searching, Sorting and splitting.
B) broadcasting and Plotting NumPy arrays
A: Array manipulation, Searching, Sorting and splitting.
1.Array manipulation

#creating arrays
Import numpy as np
print(" Array manipulation ")
print(" ======================")
np.zeros(10, dtype='int')
np.ones((3,8), dtype=float)
np.arange(0, 20, 2)
np.full((3,5),1.23)
x1 = np.array([4, 3, 4, 4, 8, 4])
print ( "the array elements are : " ,x1)
print("After accessing the value of index zero in array is : ",x1[0] )
print("After accessing the last index value in array is : ",x1[-1])
print("\n")
print(" Splitting the Array ")
print(" ======================")
x = np.arange(10)
print("Before Spliting array : the elements are = ", x)
x1,x2,x3 = np.split(x,[3,6])
print("After Spliting the arrays ")
print(x1)
print(x2)
print(x3)
print("\n")

Output:

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page22


22MCAL36: Data Analytics Lab with Mini-Project

Array manipulation
======================
the array elements are : [4 3 4 4 8 4]
After accessing the value of index zero in array is : 4
After accessing the last index value in array is : 4

Splitting the Array


======================
Before Spliting array : the elements are = [0 1 2 3 4 5 6 7 8 9]
After Spliting the arrays
[0 1 2]
[3 4 5]
[6 7 8 9]

2. Sorting the array


Import numpy as np
print(" Sorting an Array ")
print(" ======================")
arr = np.array([3, 2, 0, 1])
print("Before Sorting the array, Elements are ", arr)
print("After Sorting the array, Elements are ", np.sort(arr))
arr = np.array(['banana', 'cherry', 'apple'])
print("Before Sorting the array of string, Elements are ", arr)
print("After Sorting the array of string Elements are ", np.sort(arr))
arr = np.array([[3, 2, 4], [5, 0, 1]])
print("Before Sorting the array of two dimensional Elements are ", arr)
print("After Sorting the array of two dimensional Elements are ", np.sort(arr))
print("\n")
Output:
Sorting an Array

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page23


22MCAL36: Data Analytics Lab with Mini-Project

======================
Before Sorting the array, Elements are [3 2 0 1]
After Sorting the array, Elements are [0 1 2 3]
Before Sorting the array of string, Elements are ['banana' 'cherry' 'apple']
After Sorting the array of string Elements are ['apple' 'banana' 'cherry']
Before Sorting the array of two dimensional Elements are [[3 2 4]
[5 0 1]]
After Sorting the array of two dimensional Elements are [[2 3 4]
[0 1 5]]

3. Searching array

Import numpy as np
print(" Searching an Array ")
print(" ======================")
arr = np.array([1, 2, 3, 4, 5, 4, 4])
print("Before Searching the array, Elements are ", arr)

# Find the indexes where the value is 4:


x = np.where(arr == 4)
print("The element after Find the indexes where the value is 4:", x)

# It will find the value 4 and return the index is present at index 3, 5, and 6.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

#Find the indexes where the values are even:


x = np.where(arr%2 == 0)
print("The element after Find the indexes where the values are even:", x)

#Find the indexes where the values are odd:


arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 1)
print("The element after Find the indexes where the values are odd:", x)
print("\n")
print(" Search Sorted an Array ")
print(" ========================")

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page24


22MCAL36: Data Analytics Lab with Mini-Project

arr = np.array([6, 7, 10, 9])


x = np.searchsorted(arr, 8)
print(" Order of the specified value is ", x)
print("\n")

Output:
Searching an Array
======================
Before Searching the array, Elements are [1 2 3 4 5 4 4]
The element after Find the indexes where the value is 4: (array([3, 5, 6], dtype=int64),)
The element after Find the indexes where the values are even: (array([1, 3, 5, 7], dtype=int64),)
The element after Find the indexes where the values are odd: (array([0, 2, 4, 6], dtype=int64),)

Search Sorted an Array


========================
Order of the specified value is 2

B . broadcasting and Plotting NumPy arrays


Import numpy as np
print(" Broad casting ")
print(" ===============")
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
print(" Broad casting of a* b is :" , a * b)
a = np.array([1.0, 2.0, 3.0])
b = 3.0
print(" Broad casting of a* b is :" , a * b)

Output:
Broad casting
===============

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page25


22MCAL36: Data Analytics Lab with Mini-Project

Broad casting of a* b is : [2. 4. 6.]


Broad casting of a* b is : [3. 6. 9.]

# General Broadcasting Rules - Two dimensions are compatible when


# 1. they are equal, or 2. one of them is 1 If these conditions are not met, a ValueError: operands
could not be broadcast together exception is thrown,

Import numpy as np
Import matplotlib as plt

x = np.arange(4)
xx = x.reshape(4,1)
y = np.ones(5)
z = np.ones((3,4))
print(" X.shape is=", x.shape)
print("\n")
print(" Y.shape is=",y.shape)
print("\n")
print(" XX.shape is=",xx.shape)
print("\n")
print(" Y.shape is=",y.shape)
print("\n")
print(" XX+Y.shape is=\n",(xx + y).shape)
print("\n")
print(" XX+Y is=\n",xx + y)
print("\n")
print(" X.shape is=",x.shape)
print("\n")
print(" Z.shape is=\n",z.shape)
print("\n")
print(" X+Z .shape is=\n",(x + z).shape)
print("\n")
print(" X + Z is=\n",x + z)

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page26


22MCAL36: Data Analytics Lab with Mini-Project

# Plotting graph
print("\n")
print(" Plotting graph of an Array ")
print(" =============================")
x = np.array([5, 10, 15])
#x = np.arange(1,11)
y=1*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()

Output:
X.shape is= (4,)
Y.shape is= (5,)
XX.shape is= (4, 1)
Y.shape is= (5,)
XX+Y.shape is=
(4, 5)
XX+Y is=
[[1. 1. 1. 1. 1.]
[2. 2. 2. 2. 2.]
[3. 3. 3. 3. 3.]
[4. 4. 4. 4. 4.]]
X.shape is= (4,)
Z.shape is=
(3, 4)
X+Z .shape is=
(3, 4)

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page27


22MCAL36: Data Analytics Lab with Mini-Project

X + Z is=
[[1. 2. 3. 4.]
[1. 2. 3. 4.]
[1. 2. 3. 4.]]

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page28


22MCAL36: Data Analytics Lab with Mini-Project

6. Implement a python program to demonstrate Data visualization with


various Types of Graphs using Numpy
#### Data Visualization
#Library - Matplotlib
import matplotlib as plt
import numpy as np
import matplotlib.pyplot as plt
####Line Plot
x1 = [1,2,3,4,5]
y1 = [2,5,2,6,8]
x2 = [1,2,3,4,5]
y2 = [4,5,8,9,10]
plt.xlabel("X Axis",fontsize=12,fontstyle='italic')
plt.ylabel("Y Axis",fontsize=12)
plt.title("Line Plot",fontsize=15,fontname='DejaVu Sans')
plt.plot(x1,y1,color="red",label="First Graph") ### line plot
plt.plot(x2,y2,color="blue",label="Second Graph") ### line plot
plt.legend(loc=2)
plt.grid()
#plt.axis('off')
plt.show()

#### Bar Plot


#x = [1,2,3,4,5]
x = ['A',"B","C","D","E"]
y = [20,50,20,60,80]

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page29


22MCAL36: Data Analytics Lab with Mini-Project

plt.xlabel("X Axis",fontsize=12)
plt.ylabel("Y Axis",fontsize=12)
plt.title("Bar Plot",fontsize=15)
plt.bar(x,y,color="red",width=0.5) ### bar plot
plt.show()

#### Scatter Plot x1 = [1,2,3,4,5] y1 = [2,5,2,6,8] x2 = [1,2,3,4,5]


y2 = [4,5,8,9,10]
plt.xlabel("X Axis",fontsize=12,fontstyle='italic'),plt.ylabel("Y Axis",fontsize=12)
plt.title("Line Plot",fontsize=15,fontname='Courier')
plt.scatter(x1,y1,color="red",label="First Graph") ### line plot
plt.scatter(x2,y2,color="blue",s=150,marker="*",label="Second Graph") ### line plot
plt.plot(x2,y2,color="blue")
plt.legend(loc=2)
#plt.axis('off')
plt.show()

### Histogtram
import numpy as np
sample = np.random.randint(10,100,30)

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page30


22MCAL36: Data Analytics Lab with Mini-Project

plt.hist(sample,rwidth=0.7)
plt.show()

plt.figure(figsize=(7,7))
slices = [### Pie Chart
10,20,50,30,34]
act = ["A","B","C","D","E"]
cols = ["red","blue","green","pink","yellow"]
plt.pie(slices,labels=act,colors=cols,
autopct="%1.2f%%",explode=(0,0.2,0,0.1,0))
plt.show()

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page31


22MCAL36: Data Analytics Lab with Mini-Project

7. Write a Python program that creates a mxn integer arrayand Prints its
attributes using matplotlib

import numpy as np
import matplotlib.pyplot as plt
q1=np.array([[1,1,5],
[3,3,3],
[1,1,5]])
plt.imshow(q1)
plt.colorbar()
plt.show()
q2=np.array(range(12,24))
q2=q2.reshape(3,4)
plt.imshow(q2,cmap='rainbow')
plt.colorbar()
plt.show()
q3=np.array(range(1,201))
q3=q3.reshape(20,10)
q3[2,2]=100
q3[5,3]=9
plt.imshow(q3,cmap='jet')
plt.colorbar()
plt.show()

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page32


22MCAL36: Data Analytics Lab with Mini-Project

[[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]]

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page33


22MCAL36: Data Analytics Lab with Mini-Project

8. Write a Python program to demonstrate the generation of linear


regression models.

# linear regression
import numpy as np
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot as plt
x=np.array([1,0,20,40,50,70,80,90,120])
y=np.array([3,20,90,110,130,170,150,200,260])
linreg=LinearRegression()
x=x.reshape(-1,1)
linreg.fit(x,y)
y_pred=linreg.predict(x)
plt.scatter(x,y)
plt.plot(x,y_pred,color='red')
plt.show()

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page34


22MCAL36: Data Analytics Lab with Mini-Project

9. Write a Python program to demonstrate the generation of logistic


regression models using Python.

# Importing the needed Python packages

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
import matplotlib.pyplot as plt
# Build a dataframe
Candidates = {'gmat': [ 780,750,690,710,680,730,690, 720, 740, 690, 610, 690, 710, 680, 770,610, 580,
650, 540, 590, 620, 600, 550, 550, 570, 670, 660, 580, 650,
660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4, 3.9, 3.3, 3.7, 3.9, 3.7, 2.3, 3.3, 3.3, 1.7, 2.7, 3.7, 3.7, 3.3, 3.3, 3, 2.7, 3.7, 2.7, 2.3, 3.3, 2, 2.3,
2.7, 3, 3.3, 3.7, 2.3, 3.7, 3.3, 3, 2.7, 4, 3.3, 3.3, 2.3, 2.7, 3.3, 1.7,3.7],
'work_experience': [3, 4, 3, 5, 4, 6, 1, 4, 5, 1, 3 , 5, 6, 4, 3, 1, 4, 6, 2, 3, 2, 1, 4, 1, 2, 6, 4, 2, 6, 5, 1, 2, 4,
6, 5, 1, 2, 1, 4, 5],
'admitted': [1,1,0,1,0,1,0,1,1,0,0, 1, 1, 0, 1, 0, 0,1, 0, 0,1, 0,0, 0,0, 1,1, 0,1, 1, 0, 0, 1, 1,1, 0, 0, 0, 0, 1] }
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa', 'work_experience', 'admitted'])
# Create the logistic regression in Python
# set the independent variables (represented as X) and the dependent variable
# (represented as y):
X = df[['gmat','gpa','work_experience']]
y = df['admitted']
#Then, apply train_test_split. For example, you can set the test size to 0.25, and # therefore the model
testing will be based on 25% of the dataset, while the
# model training will be based on 75% of the dataset:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page35


22MCAL36: Data Analytics Lab with Mini-Project

#Apply the logistic regression as follows:


logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)
#Get the Confusion Matrix:
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)
# print the Accuracy and plot the Confusion Matrix:
print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))
plt.show()
print (X_test) #test dataset
print (y_pred) #predicted values

Output:

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page36


22MCAL36: Data Analytics Lab with Mini-Project

10. Write a Python program to demonstrate Timeseries analysis with Pandas.

import matplotlib.pyplot as plt


import datetime
import numpy as np
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
x = np.array([datetime.datetime(2021, 1, 1, i, 0) for i in range(24)])
y = np.random.randint(100, size=x.shape)
plt.plot(x, y)
plt.show()

Output:

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page37


22MCAL36: Data Analytics Lab with Mini-Project

11. Write a Python program to demonstrate Data Visualization using Seaborn


# Importing libraries
import numpy as np
import seaborn as sns
# Selecting style as white, dark, whitegrid, darkgrid or ticks
sns.set( style = "white" )
# Generate a random univariate dataset
rs = np.random.RandomState( 10 )
d = rs.normal( size = 50 )

# Plot a simple histogram with binsize determined automatically


sns.distplot(d, color = "g")

Output:

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page38


22MCAL36: Data Analytics Lab with Mini-Project

PART - B

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page39


22MCAL36: Data Analytics Lab with Mini-Project

1. INTRODUCATION
Heart disease remains one of the leading causes of mortality worldwide, making it a critical area for
research and analysis. With advancements in data analytics and machine learning techniques, there is an
opportunity to gain deeper insights into the factors influencing heart disease and develop more effective
predictive models for early detection and prevention.
This project focuses on leveraging Python and data analytics tools within a Jupyter notebook
environment to analyze a dataset related to heart disease. By examining various patient attributes such
as age, sex, blood pressure, cholesterol levels, and other medical indicators, we aim to uncover patterns
and correlations that contribute to the presence or absence of heart disease.
Through exploratory data analysis (EDA), feature engineering, and machine learning algorithms, we
seek to build predictive models capable of accurately classifying individuals at risk of heart disease.
These models can assist healthcare professionals in identifying high-risk patients and implementing
timely interventions to mitigate the progression of the disease.
Furthermore, this project emphasizes the importance of data visualization techniques in understanding
complex relationships within the dataset and communicating findings effectively. By visualizing trends,
distributions, and correlations, we can elucidate underlying patterns and facilitate informed decision-
making in the realm of cardiovascular health.
Overall, this project serves as an essential endeavor in applying data analytics and machine learning
methodologies to address the challenges associated with heart disease, ultimately contributing to
advancements in healthcare and improving patient outcomes.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page40


22MCAL36: Data Analytics Lab with Mini-Project

2. REQUIREMENT ANALYSIS
1. Data Collection: Gather a comprehensive dataset containing relevant attributes such as age, sex, blood
pressure, cholesterol levels, electrocardiogram results, and other medical indicators related to heart
disease.
2. Data Pre-processing:
 Handle missing values: Identify and address any missing data in the dataset using techniques such as
imputation or removal.
 Data cleaning: Remove duplicates, outliers, and irrelevant features that may hinder the analysis
process.
 Data normalization or standardization: Scale numerical features to a common range to ensure
consistency across variables.
Encoding categorical variables: Convert categorical attributes into numerical representations suitable
for machine learning algorithms.

3. Exploratory Data Analysis (EDA):


Conduct descriptive statistics: Analyze the distribution, central tendency, and variability of each
feature.
 Visualize data distributions: Utilize histograms, box plots, and scatter plots to visualize the distribution
and relationships between variables.
 Identify correlations: Calculate correlation coefficients and visualize correlation matrices to identify
potential relationships between features and the target variable (presence of heart disease).

4. Feature Engineering:
Feature selection: Identify the most relevant features using techniques such as correlation analysis,
feature importance, or domain knowledge.
 Feature transformation: Perform transformations (e.g., logarithmic transformation) to improve the
linearity and distribution of features.
 Feature creation: Generate new features by combining or transforming existing ones to capture
additional information relevant to heart disease prediction.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page41


22MCAL36: Data Analytics Lab with Mini-Project

5. Model Selection:
Choose appropriate machine learning algorithms for heart disease classification, such as logistic
regression, decision trees, random forests, support vector machines, or neural networks.
 Utilize techniques such as cross-validation and grid search to tune hyperparameters and optimize
model performance.

6. Model Training and Evaluation:


 Split the dataset into training and testing sets to train and evaluate the models.
 Train multiple models using the training data and evaluate their performance using evaluation metrics
such as accuracy, precision, recall, F1-score, and ROC-AUC curve.
 Compare the performance of different models and select the one with the highest predictive accuracy
and generalization ability.

7. Interpretation and Visualization:


 Interpret the results of the trained models to understand the factors influencing heart disease
prediction.
 Visualize model predictions, feature importances, and decision boundaries to provide intuitive insights
into the model's behavior.
 Communicate findings effectively through visualizations, summaries, and explanations to
stakeholders and domain experts.

8. Documentation and Reporting:


Document the entire data analysis process, including data preprocessing steps, model selection criteria,
parameter tuning, and evaluation results.
 Prepare a comprehensive report summarizing the key findings, insights, limitations, and
recommendations for further research or practical applications.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page42


22MCAL36: Data Analytics Lab with Mini-Project

3. Hardware Requirement

• Processor : Intel Core i5

• Main Memory : 8GB RAM

• Hard Disk Drive : 500GB SSD

• Network Connectivity: High-Speed Internet Connection

• Monitor(s) : 15”Preferably Color Monitor

• Keyboard : Standard Multimedia keyboard

• Mouse : Optical Mouse

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page43


22MCAL36: Data Analytics Lab with Mini-Project

4. Software Requirement Specification

• Operating System : Windows10

• IDE : anaconda jupyter notebook

• Coding language : Python 3.1

• Data Source : GitHub

4.1 TOOLS AND TECHNOLOGY DETAILS:


JUPYTER NOTEBOOK :

The jupyter notebook is an application which is open source/ free. This includes
equationsand live codes. Jupyter notebooks are a side venture from the IPython enterprise which has
an IPython notebook itself. The title is obtained from languages which support R, Julia
andP y t h o n . V a r i o u s c o m p u t a t i o n a l i n f o r m a t i o n c a n b e s h a r e d u s i n g t h i s
p l a t f o r m ; t h e computational information can include statistics, code or data. Using this tool can be
highly beneficial for the faculties and students as it is a great platform for interaction. It also supports
various coding languages. This notebook consists of 2 main components:
1. An input code in thefront-end
2. The kernel at theback-end

PYTHON:

It is an object-oriented programming language. The processing happens during the runtime, a n d t h i s


i s p e r f o r m e d b y t h e i n t e r p r e t e r . P yt h o n ' s s i m p l e t o l e a r n a n d e a s y t o u s e i s a n
advantage and thus makes it developer friendly. It is easier to read and understand as the
syntax is conventional. The code can be executed line by line using the interpreter. Python
can support multiple platforms like Linux, UNIX, windows, Macintosh, and so on.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page44


22MCAL36: Data Analytics Lab with Mini-Project

The paradigms of Object-oriented programming are supported by python. The functions such
as polymorphism, operator overloading and multiple inheritance is support edpython.
DATA ANALYSIS:

Data analysis is the process of analyzing the raw data so that the processed/analyzed
datac a n b e u s e d i n a s y s t e m o r a m e t h o d / p r o c e s s . I t m a j o r l y i n v o l v e s t h
r e e s t e p s d a t a acquisition, data preprocessing and exploratory data analysis. Data acquisition is
collecting the data from various sources like agencies, etc. for further analysis. While
acquiring the data it is important to collect data which is relevant to the system or the process. Data
preprocessing is a methodology in data mining that is used to convert the raw data into
meaningful and efficient format. Many unrelated and may be present in the results.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page45


22MCAL36: Data Analytics Lab with Mini-Project

5. ANALYSIS AND DESIGN


5.1 ANALYSIS:
1. Data Collection and Pre-processing:
 Gather relevant datasets containing features such as age, sex, blood pressure, cholesterol levels, etc.
 Pre-process the data by handling missing values, encoding categorical variables, and normalizing
numerical features.

2. Exploratory Data Analysis (EDA):


 Explore the dataset's characteristics through summary statistics, histograms, box plots, and correlation
matrices.
 Identify patterns, trends, and potential relationships between features and the presence of heart
disease.

3. Data Visualization:
 Create visualizations such as scatter plots, bar charts, and heatmaps to visualize relationships and
distributions within the data.
 Use tools like Matplotlib, Seaborn, or Plotly for creating interactive visualizations.

4. Feature Selection and Engineering:


 Select relevant features based on domain knowledge and statistical significance.
 Engineer new features if necessary, such as creating interaction terms or transforming variables.

5. Model Building:
 Split the data into training and testing sets.
 Experiment with various machine learning models such as logistic regression, decision trees, random
forests, support vector machines (SVM), or neural networks.
 Tune hyper parameters using techniques like grid search or random search to optimize model
performance.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page46


22MCAL36: Data Analytics Lab with Mini-Project

6. Model Evaluation:
 Evaluate models using appropriate metrics such as accuracy, precision, recall, F1-score, and ROC-
AUC.
 Perform cross-validation to assess model generalization on unseen data.
 Investigate model errors and potential biases.

7. Interpretation and Insights:


 Interpret model predictions and feature importance to gain insights into factors contributing to heart
disease.
 Communicate findings and recommendations effectively through reports or presentations.

8. Deployment and Monitoring:


 Deploy the trained model into a production environment, either as a standalone application or
integrated into existing systems.
 Implement monitoring mechanisms to track model performance over time and ensure it remains
effective.

5.2 Design:
1. Project Planning and Requirements Gathering:
Define project objectives, scope, and success criteria.
 Gather requirements from stakeholders, including data sources, desired analyses, and deliverables.

2. Data Collection and Preparation:


Identify and acquire relevant datasets containing features related to heart disease.
 Pre-process the data by handling missing values, encoding categorical variables, and normalizing
numerical features.

3. Exploratory Data Analysis (EDA):


 Explore the dataset to understand its structure and characteristics.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page47


22MCAL36: Data Analytics Lab with Mini-Project

 Conduct statistical analysis and visualization to uncover patterns, trends, and relationships in the
data.

4. Feature Engineering and Selection:


 Engineer new features or transform existing ones to improve model performance.
 Select relevant features based on domain knowledge and statistical significance.

5. Model Development:
 Choose appropriate machine learning algorithms for classification tasks, such as logistic regression,
decision trees, random forests, SVM, or neural networks.
 Split the data into training and testing sets.
 Train multiple models and evaluate their performance using appropriate metrics.

6. Model Evaluation and Validation:


Assess model performance using cross-validation techniques to ensure generalization to unseen data.
 Tune hyper parameters using methods like grid search or random search to optimize model
performance.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page48


22MCAL36: Data Analytics Lab with Mini-Project

6. IMPLEMENTATION
The provided code is a Python implementation of a Heart disease analysis using jupyter notebook for the
GUI, python 3.1 for fetching data, Matplotlib for plotting, and GitHub for date selection.

6.1 Implementation Overview:


Imports:
The necessary libraries such as ‘pandas’, ‘matplotlib’, ‘seaborn’ etc. are imported.

Data Collection: Obtain a dataset containing information about attributes related to heart disease, such
as age, sex, cholesterol levels, blood pressure, etc.

Data Pre-processing: Clean the data by handling missing values, removing duplicates, and dealing
with outliers. Perform feature engineering if necessary, such as scaling numerical features or encoding
categorical variables.

Exploratory Data Analysis (EDA): Analyze the data to gain insights into relationships between
variables, identify patterns, and visualize distributions using libraries like Pandas, Matplotlib, and
Seaborn.

Feature Selection: Select the most relevant features for predicting heart disease. This can be done
using techniques like correlation analysis, feature importance from machine learning models, or domain
knowledge.

Model Selection: Choose appropriate machine learning models for classification tasks, such as
Logistic Regression, Decision Trees, Random Forests, Support Vector Machines, or Gradient Boosting
Machines.

Model Training: Split the dataset into training and testing sets, and train the selected models on the
training data. Use techniques like cross-validation for hyperparameter tuning and model evaluation.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page49


22MCAL36: Data Analytics Lab with Mini-Project

Model Evaluation: Evaluate the trained models using metrics like accuracy, precision, recall, F1-
score, and ROC-AUC score. Select the best-performing model based on these metrics.

Deployment: Deploy the selected model into a production environment using frameworks like Flask
or Django for building APIs, or packaging the model into a standalone application using libraries like
Streamlit or PyInstaller.

Monitoring and Maintenance: Continuously monitor the deployed model's performance, retrain it
periodically with new data, and update the deployment pipeline as necessary.

6.2 Implementation Process:


Understanding the data:
Data Source The dataset used here for predicting heart disease is taken from UCI Machine learning
repository. UCI is a collection of databases that are used for implement machine learning algorithms.
The dataset used here is real dataset. The dataset consists of 300 instance of data with the appropriate 14
clinical parameters. The clinical parameter of dataset is about tests which are taken related to the heart
disease as like blood pressure level, chest pain type, electrocardiographic result and etc..
 Data Pre-Processing:
Organize your selected data by formatting, cleaning and sampling from it. Three common data pre-
processing steps are:
1. Formatting
2. Cleaning
3. Sampling

 EXPLORATORY DATA ANALYSIS (EDA):


In this section we are going to distribute the target value is vital for choosing appropriate accuracy
metrics and consequently properly assess different machine learning models. First of all, we are going
to count values of explained variable otherwise known as the determining variable which is going

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page50


22MCAL36: Data Analytics Lab with Mini-Project

give us the prediction of a patient being affected by heart disease or not. Second of all we are going to
separate numeric features from categorical features. Then we are going to show the relation between the
categorical features in various plots and try to figure 44 out or rather observe the influence of those
categorical features in the actual determining variable “diagnosis”. It’s really essential that the dataset
we are working on should be approximately balanced. An extremely imbalanced dataset can render the
whole model training useless and thus, will be of no use. If it is in imbalance dataset we have to do
weather under sampling or over sampling to compensate the class data into balanced data.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page51


22MCAL36: Data Analytics Lab with Mini-Project

6.3 Source code:

#Importing all the libraries that we need.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#importing our dataset.


df = pd.read_csv('heart.csv')

#checking first five rows by calling df.head()


df.head()

df.tail()

#take a look at the colmun names.


df.columns.values

#checking for null values


df.isna().sum()

#concise summary of our dataset.


df.info()

#plotting histogram of all numeric values


df.hist(bins = 50, grid = False, figsize=(20,15));

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page52


22MCAL36: Data Analytics Lab with Mini-Project

#Generating descriptive statistics.


df.describe()

#getting the values


df.target.value_counts()

#plotting bar chart.


df.target.value_counts().plot(kind = 'bar', color=["orchid", "salmon"])
plt.title("Heart Disease values")
plt.xlabel("1 = Heart Disease, 0 = No heart Disease")
plt.ylabel("Amount");

#plotting a pie chart


df.target.value_counts().plot(kind = 'pie', figsize = (8, 6))
plt.legend(["Disease", "No disease"]);

df.sex.value_counts()

#plotting a pie chart


df.sex.value_counts().plot(kind = 'pie', figsize = (8, 6))
plt.title('Male Female ratio')
plt.legend(['Male', 'Female']);

pd.crosstab(df.target, df.sex)

sns.countplot(x = 'target', data = df, hue = 'sex')


plt.title("Heart Disease Frequency for Sex")
plt.xlabel("0 = No heart Disease, 1 = Heart Disease");

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page53


22MCAL36: Data Analytics Lab with Mini-Project

#counting values for different chest pain


df.cp.value_counts()

#plotting a bar chart


df.cp.value_counts().plot(kind = 'bar', color = ['salmon', 'lightskyblue', 'springgreen', 'khaki'])
plt.title('Chest pain type vs count');

pd.crosstab(df.sex, df.cp)

pd.crosstab(df.sex, df.cp).plot(kind = 'bar', color = ['coral', 'lightskyblue', 'plum', 'khaki'])


plt.title('Type of chest pain for sex')
plt.xlabel('0 = Female, 1 = Male');

pd.crosstab(df.cp, df.target)

sns.countplot(x = 'cp', data = df, hue = 'target');

#create a distribution plot with normal distribution curve


sns.displot( x = 'age', data = df, bins = 30, kde = True);

sns.displot(x = 'thalach', data = df, bins = 30, kde = True, color = 'chocolate');

# Creating a figure
plt.figure(figsize=(10,6))

#plotting the values for people who have heart disease


plt.scatter(df.age[df.target==1],
df.thalach[df.target==1],
c="tomato") # color

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page54


22MCAL36: Data Analytics Lab with Mini-Project

#plotting the values for people who doesn't have heart disease
plt.scatter(df.age[df.target==0],
df.thalach[df.target==0],
c="lightgreen") # color

# Addind info
plt.title("Heart Disease w.r.t Age and Max Heart Rate")
plt.xlabel("Age")
plt.legend(["Disease", "No Disease"])
plt.ylabel("Max Heart Rate");

sns.kdeplot(x = 'age', y = 'thalach', data = df, color = 'darkcyan');

sns.displot(x = df.thalach[df.target==1], data = df, kde = True, color= 'olive')


plt.title("Maximum heart achieved of peple with heart disease")
plt.xlabel("Maximum heart rate achieved")
plt.ylabel("Number of people with heart disease");

sns.displot(x = df.thalach[df.target==0], data = df, kde = True, color= 'slategray')


plt.title("Maximum heart achieved of people without heart disease")
plt.xlabel("Maximum heart rate achieved")
plt.ylabel("Number of people without heart disease");

sns.displot(x = 'chol', data = df, bins = 30, kde = True, color = 'teal');

# Creating another figure


plt.figure(figsize=(10,6))

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page55


22MCAL36: Data Analytics Lab with Mini-Project

#plotting the values for people who have heart disease


plt.scatter(df.age[df.target==1],
df.chol[df.target==1],
c="salmon") # define it as a scatter figure

#plotting the values for people who doesn't have heart disease
plt.scatter(df.age[df.target==0],
df.chol[df.target==0],
c="lightblue") # axis always come as (x, y)

# Add some helpful info


plt.title("Heart Disease w.r.t Age and Serum Cholestoral")
plt.xlabel("Age")
plt.legend(["Disease", "No Disease"])
plt.ylabel("Serum cholestoral");

sns.kdeplot(x = 'age', y = 'chol', data = df, color = 'firebrick');

sns.displot(x = df.chol[df.target==1], data = df, kde = True, color= 'dodgerblue')


plt.title("Serum Cholestoralof people with heart disease")
plt.xlabel("Serum Cholestoral")
plt.ylabel("Number of people with heart disease");

sns.displot(x = df.chol[df.target==0], data = df, kde = True, color= 'forestgreen')


plt.title("Serum Cholestoralof people without heart disease")
plt.xlabel("Serum Cholestoral")
plt.ylabel("Number of people without heart disease");

sns.countplot(x = 'exang', data = df, hue = 'sex')

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page56


22MCAL36: Data Analytics Lab with Mini-Project

plt.title('Exercise induced angina for sex')


plt.xlabel('exercise induced angina');

df.fbs.value_counts()

df.fbs.value_counts().plot(kind = 'pie', figsize = (8, 6))


plt.legend(['fbs<120 mg/dl', 'fbs>120 mg/dl']);

pd.crosstab(df.sex, df.fbs)

pd.crosstab(df.sex, df.fbs).plot(kind = 'bar', color = ['lightblue', 'salmon'])


plt.title("Fasting blood sugar w.r.t sex")
plt.xlabel("0 = Female, 1 = Male")
plt.ylabel("Count");

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page57


22MCAL36: Data Analytics Lab with Mini-Project

7. TESTING
Testing is a crucial part of software development to ensure that the implemented code functions correctly
and meets the requirements. Here's how you can test the provided code:

Input Validation Testing:


Verify that the application handles various types of inputs correctly:
Empty symbols.
Invalid window size (non-integer).
Invalid date range (start date = end date).

Data Retrieval Testing:


Test the functionality to fetch data from GitHub using the provided symbols and date range.
Check how the application responds to invalid symbols or unavailable data.

Graph Generation Testing:


Confirm that the application generates graphs correctly with the desired attributes. Test the appearance
and behavior of the generated graph, including the x-axis labels, legend, and plotted lines.

User Interface Testing:


Evaluate the usability and responsiveness of the GUI components (e.g., entry fields, buttons, dropdown
menus).
Test interactions such as selecting dates, entering stock symbols, and triggering actions like graph
generation.

Error Handling Testing:


Intentionally provide incorrect inputs or simulate errors to ensure that the application displays
appropriate error messages.
Check how the application handles unexpected errors or exceptions gracefully without crashing.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page58


22MCAL36: Data Analytics Lab with Mini-Project

Integration Testing:
Test the interaction between different components of the application, such as the data retrieval, graph
generation, and information display functionalities.
Ensure that all components work together seamlessly to provide the intended functionality.

Boundary Testing:
Test the application with boundary values, such as minimum and maximum window size, to ensure it
behaves as expected.
Check how the application handles extreme cases, such as a large number of symbols or an extended
date range.

Usability Testing:
Gather feedback from potential users to evaluate the overall user experience, ease of use, and
intuitiveness of the application.
Identify any areas for improvement in terms of layout, design, and user interaction.

Cross-platform Testing:
Test the application on different operating systems and environments to ensure compatibility and
consistent behavior.
By systematically performing these tests and addressing any issues or bugs identified during testing, you
can ensure that the application functions reliably and delivers a satisfactory user experience.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page59


22MCAL36: Data Analytics Lab with Mini-Project

8. CONCLUSION

In conclusion, the heart disease analysis project utilizing data analytics with Python has provided
valuable insights into the factors associated with heart disease. Through data exploration, visualization,
and machine learning techniques, we have identified significant predictors and patterns within the
dataset. Our findings underscore the importance of factors such as age, cholesterol levels, blood pressure,
and exercise habits in predicting heart disease risk. Furthermore, the predictive models developed in this
project can assist healthcare professionals in early detection and prevention efforts, ultimately
contributing to better patient outcomes and public health initiatives. Moving forward, continued research
and refinement of these models will be crucial for enhancing their accuracy and applicability in real-
world clinical settings. Overall, this project demonstrates the power of data analytics in uncovering
actionable insights for combating heart disease and improving cardiovascular health outcomes.

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page60


22MCAL36: Data Analytics Lab with Mini-Project

9. REFERENCES

1. Dataset source: Specify where you obtained the dataset used in the analysis. This could be a public
repository such as the UCI Machine Learning Repository or a healthcare database.

2. Python libraries: Include references to Python libraries used for data analysis, visualization, and
machine learning, such as pandas, NumPy, matplotlib, seaborn, scikit-learn, etc. You can provide links
to their official documentation or relevant tutorials.

3. Research papers: If you referenced any academic papers or research studies related to heart disease
risk factors, predictive modeling, or data analytics methodologies, cite them accordingly. PubMed or
Google Scholar can be good sources for finding relevant papers.

4. Online tutorials or blog posts: If you followed any online tutorials or blog posts for guidance on data
analysis techniques or Python programming, give credit to those sources. Websites like Towards Data
Science, Medium, or DataCamp often have helpful tutorials on data analytics projects.

5. Books: If you consulted any books on data analytics, machine learning, or Python programming for
this project, list them in the references section.
6. https://www.kaggle.com

Dept. of CSE (MCA), VTU CPGS,Kalaburagi. Page61

You might also like