Course Code Course Title L T P C
10212EC146 INTRODUCTION TO DATA SCIENCE 3 0 0 3
c) Course Category
Program Elective
d) Preamble
The Purpose of the course is to provide strong foundation for data science and applications related
to core concepts and emerging technologies.
c) Prerequisite
Nil
e) Related Courses
Tools for Data Science, Data analysis and visualization, Machine vision, Soft computing
e) Course Outcomes
Upon the successful completion of the course, students will be able to:
Knowledge Level
CO (Based on Revised
Course Outcomes
Nos. Bloom’s
Taxonomy)
Explain the concept of vector spaces, Eigen values, Eigen
CO1 vectors and distance measures K2
CO2 Interpret the fundamentals of process involved in data science K2
CO3 Describe various data modelling techniques and its evaluation K2
CO4 Use visualization techniques to represent the data K3
Discuss the ethics surrounding privacy, data sharing and
CO5 K2
algorithmic decision-making
j) Correlation of COs with POs
P PO PO PO PO PO PO PO PO1 PO1
PO3 PO12 PSO1 PSO2
O1 2 4 5 6 7 8 9 0 1
L L
CO1 H L L - - - - - - - - -
L L
CO2 H M M - - - - - - - - -
L L
CO3 H M M - - - - - - - - -
L L
CO4 H H H - L - - - - - - -
L L
CO5 M - - - - - - - - - - -
k) Course Content
UNIT I MATHEMATICS FOR DATA SCIENCE 9
Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Eigenvalues and
Eigenvectors - Eigenvectors and Upper Triangular matrices – Eigenspaces and Diagonal Matrices,
Distance measures - Euclidean Distance, Manhattan Distance, Hamming Distance, Cosine
Similarity.
UNIT II DATA SCIENCE PROCESS 9
Fundamentals of Data Science, Data Preparation: The Problem Understanding Phase, Data
Preparation Phase, Adding an Index Field, Changing Misleading Field Values, Reexpression of
Categorical Data as Numeric, Standardizing the Numeric Fields, Identifying Outliers.
UNIT III DATA MODELING AND EVALUATION
9
Data Modeling: Partitioning the Data, validating your Partition, Balancing the Training Data Set,
Establishing Baseline Model Performance, Model Evaluation: Classification Evaluation Measures,
Sensitivity and Specificity, Precision, Recall, and Fβ Scores, Method for Model Evaluation.
UNIT IV DATA VISUALIZATION 9
Introduction to data visualization, visualization techniques: scatter plots, line graphs, pie charts, bar
charts, heat maps, area charts and histograms, case study: Survey on Covid-19 dataset.
UNIT V ETHICS IN DATA SCIENCE 9
Importance of ethics in data science, doing good data science, data privacy – degrees of privacy,
valuing different aspects of privacy, modern privacy risks, Getting informed consent, The Five Cs –
consent, clarity, consistency and trust, control and transparency, consequences, Diversity, Inclusion,
future trends.
Total 45 Hrs
Text Book:
1. Sheldon Axler, “Linear algebra done right”, 3rd Edition, Springer,2015.
2. Chantal D. Larose, Daniel T. Larose, “Data Science Using Python and R”, John
Wiley & Sons, Inc., First Edition, 2019.
3. D J Patil, Hilary Mason, Mike Loukides, “Ethics and Data Science” , O’ Reilly
Media Publishers, 1st edition, 2018.
References:
1. E. Davis, “Linear algebra and probability for computer science applications”, CRC
Press, 2012.
2. Dr. Ossama Embarak, “Data Analysis and Visualization using Python”, Apress,
2018,
3. Cathy O'Neil, Rachel Schutt, “Doing Data Science”, O’ Reilly media publishers, 1st
edition, 2013
Online resources:
1. https://towardsdatascience.com/intro-to-data-science-531079c38b22?gi=1fb573279fdb
2. https://www.edureka.co/blog/what-is-data-science/
3. https://www.youtube.com/watch?v=KxryzSO1Fjs
4. https://www.simplilearn.com/tutorials/data-science-tutorial/introduction-to-data-science
5. https://cognitiveclass.ai/courses/data-science-101
6. https://towardsdatascience.com/introduction-to-machine-learning-for-beginners-
eed6024fdb08
7. https://www.youtube.com/watch?v=njKP3FqW3Sk
8. https://www.analyticsvidhya.com/blog/2020/03/6-data-visualization-python-libraries/