KEMBAR78
Data Science | PDF | Software Development | Computer Programming
0% found this document useful (0 votes)
41 views30 pages

Data Science

The document summarizes several popular Python machine learning libraries: 1. NumPy is used for multi-dimensional arrays and matrix operations. It provides mathematical functions and is used internally by TensorFlow. 2. Pandas is used for data analysis and preparation. It provides tools for grouping, filtering, and combining data. 3. Matplotlib is used for data visualization and creating plots and graphs. It allows formatting of axes. 4. TensorFlow is used for deep learning and training neural networks. It involves defining and running tensor computations. 5. Keras provides tools for building and designing neural networks and allows fast prototyping. It can run on TensorFlow, CNTK, or Theano. 6

Uploaded by

Agaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views30 pages

Data Science

The document summarizes several popular Python machine learning libraries: 1. NumPy is used for multi-dimensional arrays and matrix operations. It provides mathematical functions and is used internally by TensorFlow. 2. Pandas is used for data analysis and preparation. It provides tools for grouping, filtering, and combining data. 3. Matplotlib is used for data visualization and creating plots and graphs. It allows formatting of axes. 4. TensorFlow is used for deep learning and training neural networks. It involves defining and running tensor computations. 5. Keras provides tools for building and designing neural networks and allows fast prototyping. It can run on TensorFlow, CNTK, or Theano. 6

Uploaded by

Agaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

IU2041230140 DS-B2

Practical-01
Aim:-Introduction to Jupyter Notebook.

Installation

you can use a handy tool that comes with Python called pip to install
Jupyter Notebook like this:
$ pip install jupyter

The next most popular distribution of Python is Anaconda.

Starting the Jupyter Notebook Server

open up your terminal application and go to a folder of your choice. go


to that location in your terminal and run the following command:
$ jupyter notebook

This will start up Jupyter and your default browser should start (or open a
new tab) to the following URL: http://localhost:8888/tree

Your browser should now look something like this:

right now you are not actually running a Notebook, but instead you are
just running the Notebook server.

1
IU2041230140 DS-B2

Creating a Notebook
click on the New button (upper right), choose Python 3.

Your web page should now look like this:

Naming

You will notice that at the top of the page is the word Untitled. let’s
change it!

Let’s try writing the code to the running cell:

print('Hello Jupyter!'):

2
IU2041230140 DS-B2

Practical-02
Aim:-To Implement Python Basic Programs.

❖ Python program to print "Hello Python"

1. print ('Hello Python')

Output: Hello World

❖ Python program to do arithmetical operations



1. num1 = input('Enter first number: ')
2. num2 = input('Enter second number: ')
3. sum = float(num1) + float(num2)
4. min = float(num1) - float(num2)
5. mul = float(num1) * float(num2)
6. div = float(num1) / float(num2)
7. print('The sum of {0} and {1} is {2}'.format(num1, num2, sum))
8. print('The subtraction of {0} and {1} is {2}'.format(num1, num2, min))
9. print('The multiplication of {0} and {1} is {2}'.format(num1, num2, mul))
10. print('The division of {0} and {1} is {2}'.format(num1, num2, div))

Output:
Enter first number: 10
Enter second number: 20
The sum of 10 and 20 is 30.0
The subtraction of 10 and 20 is -10.0
The multiplication of 10 and 20 is 200.0
The division of 10 and 20 is 0.5

❖ Python program to find the area of a triangle


1. a = float(input('Enter first side: '))
2. b = float(inpu 'EnterseconDS-B2ide:'
3. c = float(i npu 'EnterthirDS-B2ide:'
4. s = (a + b + c) / 2
5. area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
6. print('The area of the triangle is %0.2f' %area)

3
IU2041230140 DS-B2

Output:

❖ Python program to solve quadratic equation

1. import cmath
2. a = float(input('Enter a: '))
3. b = float(input('Enter b: '))
4. c = float(input('Enter c: '))
5. d = (b**2) - (4*a*c)
6. sol1 = (-b-cmath.sqrt(d))/(2*a)
7. sol2 = (-b+cmath.sqrt(d))/(2*a)
8. print('The solution are {0} and {1}'.format(sol1,sol2))

Output:
Enter a: 8
Enter b: 5
Enter c: 9
The solution are (-0.3125-1.0135796712641785j) and (-0.3125+1.01357967126

❖ Python program to swap two variables

1. P = int( input("Please enter value for P: "))


2. Q = int( input("Please enter value for Q: "))
3. temp_1 = P
4. P=Q
5. Q = temp_1
6. print ("The Value of P after swapping: ", P)
7. print ("The Value of Q after swapping: ", Q)

Output:
Please enter value for P: 13

4
IU2041230140 DS-B2

Please enter value for Q: 43


The Value of P after swapping: 43
The Value of Q after swapping: 13

❖ Python program to generate a random number

1. import random
2. n = random.random()
3. print(n)
Output:
0.7632870997556201
If we run the code again, we will get the different output as follows.
0.8053503984689108

Generating a Number within a Given Range

1. import random
2. n = random.randint(0,50)
3. print(n)
Output:
40
❖ Python program to display calendar
1. import calendar
2. yy = int(input("Enter year: "))
3. mm = int(input("Enter month: "))
4. print(calendar.month(yy,mm))
Output:

Enter year: 2022


Enter month: 6
June 2022
Mo Tu We Th Fr Sa Su
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30

5
IU2041230140 DS-B2

Practical-03
Aim:-Study of various Machine Learning libraries.

>Python libraries that are used in Machine Learning are:

1.Numpy: NumPy is a very popular python library for large multi-dimensional array
and matrix processing, with the help of a large collection of high-level mathematical
functions. It is very useful for fundamental scientific computations in Machine
Learning. It is particularly useful for linear algebra, Fourier transform, and random
number capabilities. High-end libraries like TensorFlow uses NumPy internally for
manipulation of Tensors.

import numpy as np

x = np.array([[1, 2], [3, 4]])


y = np.array([[5, 6], [7, 8]])

v = np.array([9, 10])
w = np.array([11, 12])

print(np.dot(v, w), "\n")

print(np.dot(x, v), "\n")

print(np.dot(x, y))

Output:
219
[29 67]
[[19 22]
[43 50]]

2.Pandas: Pandas is a popular Python library for data analysis. It is not directly related
to Machine Learning. As we know that the dataset must be prepared before training. In
this case, Pandas comes handy as it was developed specifically for data extraction and

analysis. It provides many inbuilt methoDS-B2 for grouping, combining and filtering data.
import pandas as pd

data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],


"capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],

6
IU2041230140 DS-B2

"area": [8.516, 17.10, 3.286, 9.597, 1.221],


"population": [200.4, 143.5, 1252, 1357, 52.98] }

data_table = pd.DataFrame(data)
print(data_table)

Output:

3.Matplotlib: Matplotlib is a very popular Python library for data visualization. Like
Pandas, it is not directly related to Machine Learning. It particularly comes in handy
when a programmer wants to visualize the patterns in the data. It is a 2D plotting library
used for creating 2D graphs and plots. A module named pyplot makes it easy for

formatting axes, etc. It provides various kinDS-B2 of graphs and plots for data visualization,
viz., histogram, error charts, bar chats, etc,

import matplotlib.pyplot as plt


import numpy as np
x = np.linspace(0, 10, 100)

plt.plot(x, x, label ='linear')

plt.legend()

plt.show()

Output:

7
IU2041230140 DS-B2

4.TensorFlow: TensorFlow is a very popular open-source library for high performance


numerical computation developed by the Google Brain team in Google. As the name
suggests, Tensorflow is a framework that involves defining and running computations
involving tensors. It can train and run deep neural networks that can be used to develop
several AI applications. TensorFlow is widely used in the field of deep learning
research and application.

import tensorflow as tf

x1 = tf.constant([1, 2, 3, 4])
x2 = tf.constant([5, 6, 7, 8])

result = tf.multiply(x1, x2)

sess = tf.Session()

print(sess.run(result))

sess.close()

Output:
[ 5 12 21 32]
5.Keras It provides many inbuilt methoDS-B2 for groping, combining and filtering data.
Keras is a very popular Machine Learning library for Python. It is a high-level neural
networks API capable of running on top of TensorFlow, CNTK, or Theano. It can run
seamlessly on both CPU and GPU. Keras makes it really for ML beginners to build and
design a Neural Network. One of the best thing about Keras is that it allows for easy
and fast prototyping.

8
IU2041230140 DS-B2

6.PyTorch: PyTorch is a popular open-source Machine Learning library for Python


based on Torch, which is an open-source Machine Learning library that is implemented
in C with a wrapper in Lua. It has an extensive choice of tools and libraries that support
Computer Vision, Natural Language Processing(NLP), and many more ML programs.
It allows developers to perform computations on Tensors with GPU acceleration and
also helps in creating computational graphs.

import torch

dtype = torch.float
device = torch.device("cpu")
N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.random(N, D_in, device=device, dtype=dtype)


y = torch.random(N, D_out, device=device, dtype=dtype)

w1 = torch.random(D_in, H, device=device, dtype=dtype)


w2 = torch.random(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)

loss = (y_pred - y).pow(2).sum().item()


print(t, loss)

grad_y_pred = 2.0 * (y_pred - y)


grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)

w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2

Output:
0 47168344.0
1 46385584.0
2 43153576.0
...
...
...
497 3.987660602433607e-05

9
IU2041230140 DS-B2

498 3.945609932998195e-05
499 3.897604619851336e-05

7.SciPy: SciPy is a very popular library among Machine Learning enthusiasts as it


contains different modules for optimization, linear algebra, integration and statistics.
There is a difference between the SciPy library and the SciPy stack. The SciPy is one
of the core packages that make up the SciPy stack. SciPy is also very useful for image
manipulation.

from scipy.misc import imread, imsave, imresize

img = imread('D:/Programs / cat.jpg') # path of the image


print(img.dtype, img.shape)

img_tint = img * [1, 0.45, 0.3]


imsave('D:/Programs / cat_tinted.jpg', img_tint)
img_tint_resize = imresize(img_tint, (300, 300))
imsave('D:/Programs / cat_tinted_resized.jpg', img_tint_resize)

If scipy.misc import imread, imsave,imresize does not work on your operating system
then try below code instead to proceed with above code
!pip install imageio
import imageio
from imageio import imread, imsave
Original image:

Tinted image:

10
IU2041230140 DS-B2

Resized tinted image:

8.Scikit-learn:Scikit-learn is one of the most popular ML libraries for classical ML


algorithms. It is built on top of two basic Python libraries, viz., NumPy and SciPy.
Scikit-learn supports most of the supervised and unsupervised learning algorithms.
Scikit-learn can also be used for data-mining and data-analysis, which makes it a great
tool who is starting out with ML.

from sklearn import datasets


from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier

dataset = datasets.load_iris()
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target)
print(model)

expected = dataset.target
predicted = model.predict(dataset.data)

print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

Output:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,


max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')

precision recall f1-score support

0 1.00 1.00 1.00 50

11
IU2041230140 DS-B2

1 1.00 1.00 1.00 50


2 1.00 1.00 1.00 50

micro avg 1.00 1.00 1.00 150


macro avg 1.00 1.00 1.00 150
weighted avg 1.00 1.00 1.00 150

[[50 0 0]
[ 0 50 0]
[ 0 0 50]]

9.Theano: We all know that Machine Learning is basically mathematics and statistics.
Theano is a popular python library that is used to define, evaluate and optimize
mathematical expressions involving multi-dimensional arrays in an efficient manner. It
is achieved by optimizing the utilization of CPU and GPU. It is extensively used for
unit-testing and self-verification to detect and diagnose different types of errors. Theano
is a very powerful library that has been used in large-scale computationally intensive
scientific projects for a long time but is simple and approachable enough to be used by
individuals for their own projects.

import theano
import theano.tensor as T
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = theano.function([x], s)
logistic([[0, 1], [-1, -2]])

Output:
array([[0.5, 0.73105858],
[0.26894142, 0.11920292]])

12
IU2041230140 DS-B2

Practical-04
Aim:-Introduction to GitHub Repository.

What GIT is about?

Git is a free and open-source distributed version control system designed to


handle everything from small to very large projects with speed and efficiency.

Git relies on the basis of distributed development of software where more than
one developer may have access to the source code of a specific application and
can modify changes to it that may be seen by other developers.

Initially designed and developed by Linus TorvalDS-B2 for Linux kernel


development in 2005.

Every git working directory is a full-fledged repository with complete history


and full version tracking capabilities, independent of network access or a central
server.

Git allows a team of people to work together, all using the same files. And it
helpsthe team cope with the confusion that tenDS-B2 to happen when multiple
people are editing the same files.

How does GIT work?

A Git repository is a key-value object store where all objects are indexed by their
SHA-1 hash value.

All commits, files, tags, and filesystem tree nodes are different types of objects
living in this repository.

A Git repository is a large hash table with no provision made for hash collisions.

Git specifically works by taking “snapshots” of files.

● Let’s us see how to host to a local repository to Github, from very


beginning(creating a github account).

A. Creating a GitHub Account

13
IU2041230140 DS-B2

Step 1: Go to github.com and enter the required user credentials asked on the site
and then click on the SignUp for GitHub button.

Step 2: Choose a plan that best suits you. The following plans are available as
shown in below media as depicted:

Step 3: Then Click on Finish Sign Up.

The account has been created. The user is automatically redirected to your
Dashboard.

14
IU2041230140 DS-B2

B. Creating a new Repository


● Login to your Github account
● On the dashboard click on the Green Button starting New repository.
● Make sure to verify the Github account by going into the mail which was
provided when creating the account.
● Once verification has been done, the following screen comes

C. Start by giving a repository name, description(optional) and select the


visibility and accessibility mode for the repository

D. Click on Create repository

E. The repository (in this case ITE-304 is the repository) is now created. The
repository can be created looks like:

15
IU2041230140 DS-B2

And here you go…

16
IU2041230140 DS-B2

Practical-05
Aim:-Download the data set and perform the analysis.

CODE:-

from google.colab import files


file = files.upload()

import pandas as pd
df = pd.read_csv('StudentsPerformance.csv')

df.head()

# Show last 5 rows in a DataFrame


df.tail()

# Show last n rows in a DataFrame


n = 10
df.tail(n)

17
IU2041230140 DS-B2

# Getting access to the shape attribute


df.shape
(1000,8)

# Getting access to the index attribute


df.index
RangeIndex(start=0, stop=1000, step=1)

# Getting access to the column attribute


df.loc[:,"gender"]
# df.iloc[:, 5]

# Data types of each column


df.dtypes

18
IU2041230140 DS-B2

df.info()

print(f"Count : {df.count()}")
print(f"Mean : {df.mean()}")
print(f"SD : {df.std()}")
print(f"Max : {df.max()}")
print(f"Min : {df.min()}")

19
IU2041230140 DS-B2

df.count()

df['math score'].idxmax()
149
df['math score'].idxmin()
59
df.round()

20
IU2041230140 DS-B2

df['math score']

df.loc[: , ["gender","math score"]]

df.loc[: , ["gender","math score"]].dtypes

21
IU2041230140 DS-B2

import numpy as np
df['Language Score'] = np.random.randint(100,size = (1000))
df

df["Average Score"]=(df["math score"].mean()+df["reading


score"].mean()+df["writing score"].mean())/3
df.head()

df['math score'].sort_values(ascending = True)

# Sort the MathScore in decending order


df['math score'].sort_values(ascending = False)

22
IU2041230140 DS-B2

23
IU2041230140 DS-B2

Practical-06
Aim:-Write a program to implement Linear Regression.

CODE:-
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

# mean of x and y vector


m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()

24
IU2041230140 DS-B2

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)

if __name__ == "__main__":
main()

OUTPUT:-

25
IU2041230140 DS-B2

Practical-07
Aim:-Write a program to implement K-Nearest Neighbors.
CODE:

# Import necessary modules


from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt

irisData = load_iris()

# Create feature and target arrays


X = irisData.data
y = irisData.target

# Split into training and test set


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)

neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop over K values


for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)

# Compute training and test data accuracy


train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)

# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')

26
IU2041230140 DS-B2

plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()

OUTPUT:

27
IU2041230140 DS-B2

Practical-08
Aim:-Write a program for Automatic grouping of similar objects
into sets.

CODE:-

from itertools import groupby


test_list = [ADITYA', 'coder_2', 'KIRTAN', 'coder_3', 'pro_3']
test_list.sort()
print ("The original list is : " + str(test_list))
res = [list(i) for j, i in groupby(test_list,
lambda a: a.split('_')[0])]
print ("The grouped list is : " + str(res))

from itertools import groupby


test_list = [' ADITYA ', 'coder_2', ' KIRTAN ', 'coder_3', 'pro_3']
test_list.sort()
print ("The original list is : " + str(test_list))
res = [list(i) for j, i in groupby(test_list,
lambda a: a.partition('_')[0])]
print ("The grouped list is : " + str(res))

28
IU2041230140 DS-B2

test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']


print("The original List is : "+ str(test_list))
x=[]
for i in test_list:
x.append(i[:i.index("_")])
x=list(set(x))
res=[]
for i in x:
a=[]
for j in test_list:
if(j.find(i)!=-1):
a.append(j)
res.append(a)
# printing result
print ("The grouped list is : " + str(res))

test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']


print("The original list is : " + str(test_list))
res = [[item for item in test_list if item.startswith(prefix)] for prefix in
set([item[:item.index("_")] for item in test_list])]
print("The grouped list is : " + str(res))

29
IU2041230140 DS-B2

test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']


grouped = {}
for s in test_list:
prefix = s.split('_')[0]
if prefix not in grouped:
grouped[prefix] = []
grouped[prefix].append(s)

res = list(grouped.values())
print(res)

test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']


d = {}
for s in test_list:
key = s.split('_')[0]
if key in d:
d[key].append(s)
else:
d[key] = [s]
res = list(d.values())
print("The original list is : " + str(test_list))
print("The grouped list is : " + str(res))

30

You might also like