NOIDA INSTITUE OF ENGINEERING & TECHNOLOGY,
GREATER NOIDA
Department of Information Technology
LAB FILE
ON
DATA ANALYTICS LAB
KIT-651
(6th Semester)
(2020 – 2021)
Submitted To: Submitted by:
Ms. Tanya Name: Amit Singh
Dr. Vivek Kumar Roll: 1813313019
Affiliated to Dr. A.P.J Abdul Kalam Technical University, Uttar Pradesh, Lucknow.
Data ANALYTICS LAB
KIT-651
INDEX
S.NO TOPIC DATE GRADE SIGNATURE
To get the input from user and perform numerical
1 operations (MAX, MIN, AVG, SUM, SQRT, ROUND)
using in R/Python.
To perform data import/export (.CSV, .XLS, TXT)
2
operations using data frames in R/Python.
To get the input matrix from user and perform Matrix
addition, subtraction, multiplication, inverse transpose
3
and division operations using vector concept in
R/Python.
To perform statistical operations (Mean, Median, Mode
4
and Standard deviation) using R/Python.
To perform data pre-processing operations i) Handling
5
Missing data ii) Min-Max normalization.
6 To perform Simple Linear Regression with R/Python.
7 To perform Simple Logistic Regression with R/Python.
10
11
12
13
14
15
16
Aim -1. To get the input from user and perform numerical operations (MAX,
MIN, AVG, SUM, SQRT, ROUND) using in R/Python.
import math
list1 = []
n = int(input("Enter number of elements : "))
for i in range(0, n):
ele = int(input())
list1.append(ele)
print("Sum = ",sum(list1))
print("Maximum element = ",max(list1))
print("Minimum element = ",min(list1))
print("Square root =" ,math.sqrt(list1[1]))
print("Round =",round(5.56))
print("Average = ", sum(list1)/len(list1))
OUTPUT: -
Enter number of elements : 5
1
6
2
8
7
Sum = 24
Maximum element = 8
Minimum element = 1
Square root = 2.449489742783178
Round = 6
Average = 4.8
Aim - 2. To perform data import/export (.CSV, .XLS, TXT) operations using
data frames in R/Python.
from google.colab import drive
drive.mount("/content/drive")
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Da-Lab/ITUR_rain1.csv')
print(df.Frequency)
OUTPUT: -
0 1.0
1 1.5
2 2.0
3 2.5
4 3.0
...
99 96.0
100 97.0
101 98.0
102 99.0
103 100.0
Name: Frequency, Length: 104, dtype: float64
Aim - 3. To get the input matrix from user and perform Matrix addition,
subtraction, multiplication, inverse transpose and division operations using
vector concept in R/Python.
import numpy
r = int(input("Enter no of row of matrix1 "))
c = int(input("Enter no of cloumns of matrix1 "))
m = []
print("Enter elements")
for i in range(r):
a =[]
for j in range(c):
a.append(int(input()))
m.append(a)
r1 = int(input("Enter the number of rows of matrix 2 "))
c1 = int(input("Enter the number of columns of matrix 2 "))
m1 = []
print("Enter elements")
for i in range(r1):
a1 =[]
for j in range(c1):
a1.append(int(input()))
m1.append(a1)
m2=[]
for i in range(r):
a3=[]
for j in range(c):
a3.append(m[i][j]+m1[i][j])
m2.append(a3)
print("Sum pf matrix is:")
for i in range (r):
for j in range(c):
print(m2[i][j],end=" ")
print()
pm=[]
for i in range (r):
sm=[]
for j in range (c):
s=0
for k in range (c):
s=s+m[i][k]*m1[k][j]
sm.append(s)
pm.append(sm)
print("Product of matrix:")
for i in range( r):
for j in range (c):
print(pm[i][j],end =" ")
print()
print("Transpose of multiplication matrix is :")
print(numpy.transpose(pm))
OUTPUT: -
Enter no of row of matrix1 2
Enter no of cloumns of matrix1 2
Enter elements
1
2
3
4
Enter the number of rows of matrix 2 2
Enter the number of columns of matrix 2 2
Enter elements
4
5
6
7
Sum pf matrix is:
57
9 11
Product of matrix:
16 19
36 43
Transpose of multiplication matrix is :
[[16 36]
[19 43]]
Aim -4. To perform statistical operations (Mean, Median, Mode and Standard
deviation) using R/Python.
import statistics as st
lst = []
n = int(input("Enter number of elements : "))
for i in range(0, n):
ele = int(input())
lst.append(ele)
print("Mean value is:",st.mean(lst))
print("Meadian is:",st.median(lst))
print("Mode value is :",st.mode(lst))
print("Standard deviation is :",statistics.stdev(lst))
OUTPUT :-
Enter number of elements : 5
1
2
3
4
5
Mean value is: 3
Meadian is: 3
Mode is: 0
Standard deviation is: 1.414
Aim - 5. To perform data pre-processing operations i) Handling Missing data
ii) Min-Max normalization.
import pandas as pd
import numpy as np
df = pd.read_csv("/content/drive/MyDrive/Da-Lab/titanic.csv")
df.head()
df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axis='columns',inplace=
True)
df.head()
target = df.Survived
inputs = df.drop('Survived',axis='columns')
#One-hot encoding
dummies = pd.get_dummies(inputs.Sex)
dummies.head(3)
inputs = pd.concat([inputs,dummies],axis='columns')
inputs.head(3)
inputs.drop(['Sex','male'],axis='columns',inplace=True)
inputs.head(3)
inputs.columns[inputs.isna().any()]
OUTPUT: -
Index(['Age'], dtype='object')
inputs.Age = inputs.Age.fillna(inputs.Age.mean())
inputs.head()
inputs.Age[:10]
OUTPUT: -
0 22.000000
1 38.000000
2 26.000000
3 35.000000
4 35.000000
5 29.699118
6 54.000000
7 2.000000
8 27.000000
9 14.000000
Name: Age, dtype: float64
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs,target,test_size=0.3)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train,y_train)
OUTPUT: -
GaussianNB(priors=None, var_smoothing=1e-09)
model.score(X_test,y_test)
OUTPUT: -
0.7574626865671642
model.predict(X_test[0:10])
OUTPUT: -
array([0, 1, 1, 1, 0, 1, 1, 0, 0, 1])
Aim - 6. To perform Simple Linear Regression with R/Python.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import files
uploaded = files.upload()
data = pd.read_csv("area.csv")
X = data.Area.values.astype(float)
y = data.Price.values.astype(float)
plt.scatter(X,y)
plt.xlabel("Area")
plt.ylabel("Price")
plt.show()
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
reg = linear_model.LinearRegression()
reg.fit(data[['Area']],data.Price)
OUTPUT: -
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
reg.predict([[100]])
OUTPUT: -
array([9229.8328887])
reg.coef_
OUTPUT: -
array([40.46056658])
reg.intercept_
OUTPUT: -
5183.7762302371
100.6691978*100+1118.140232700558
OUTPUT: -
11185.060012700558
Aim - 7. To perform Simple Logistic Regression with R/Python.