KEMBAR78
Syllabus Fundamentals of Data Science | PDF | Probability Distribution | Variance
0% found this document useful (0 votes)
19 views7 pages

Syllabus Fundamentals of Data Science

Uploaded by

vargiyaaryan2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Syllabus Fundamentals of Data Science

Uploaded by

vargiyaaryan2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Course Code Course name L T P C

CSDS2001P Fundamentals of Data Science 4 0 0 4


Total Units to be Covered: 06 Total Contact Hours: 60
Prerequisite(s): Database Management Systems -CSEG2046 Syllabus version: 1.0

Course Objectives
1. To understand the concept of data science.

2. To understand techniques and methods related to the area of data science on real
world applications.

Course Outcomes

After the completion of the course the students will be able to

CO1: Understand the fundamentals of data processing.


CO2: Understand and apply mathematical concepts in the field of data science.
CO3: Employ the techniques and methods related to the area of data science in a
variety of applications.
CO4: Apply logical thinking to understand and solve the problem in context.
CO5: Apply the entire concept in data analysis tools.

CO-PO Mapping

Program
Outcome
s PO PO PO PO PO PO PO PO PO PO PO PO PS PS PS
Course 1 2 3 4 5 6 7 8 9 10 11 12 O1 O2 O3
Outcome
s
CO 1 - 3 2 2 - - - - - - - - - 3 3
CO 2 - 3 2 2 - - - - - - - - - 3 3
CO 3 - 3 2 3 - - - - - - - - - 3 3
CO4 - 3 2 3 - - - - - - - - - 3 3
- - - - - - - - - -
CO5 3 2 2 2 3
- - - - - - - - - -
Average 3 2 2.4 2.8 3

1 – Weakly Mapped (Low) 2 – Moderately Mapped (Medium)

3 – Strongly Mapped (High) “_” means there is no correlation

Syllabus

Unit I: Introduction to Data Science 8 Lecture Hours


Fundamentals of Data Science, Real World Applications, Data Science Challenges,
Software Engineering for Data Science (DataOps, MLOps (intro)). Data science process
roles, Stages in data science.

Defining Analytics, Types of data analytics (Descriptive, Diagnostic, Predictive,


Prescriptive)

Data Science Process: CRISP-DM Methodology, SEMMA, BIG DATA LIFE CYCLE,
SMAM.

Unit II: Probability and statistics for Data Science 12 Lecture Hours
Probability: Introduction, finite sample spaces, conditional probability, independence;
Random variables, distribution functions, probability mass and density functions,
standard univariate discrete and continuous distributions; Mathematical expectations,
moments; Random vectors, joint, marginal, and conditional distributions, independence,
covariance, correlation, standard multivariate distributions, functions of random vectors;
central limit theorem.

Statistics: Sampling distributions of the sample mean and the sample variance for a
normal population; Point and interval estimation; Sampling distributions (Chi-square,
t,F,Z), Hypothesis testing; One tailed and two-tailed tests; Analysis of variance, ANOVA,
One way and two way classifications

Unit III: Data, Data Sources and Visualization 15 Lecture Hours


Types of Data and Datasets, Data Quality, and Issues, Data Models, General
Framework of Formal modeling, Association Analyses, Prediction Analyses, Data
Pipelines and patterns, Data from files & working with relational databases, Diverse
data sources, data warehouses, data mining, cloud, and Data lake: Characteristics,
components, Data Streaming Ingestion, Batch Data Ingestion, Data Cataloging, Data
Pipeline Stages (extraction, ingestion, cleaning, exploration, wrangling, versioning, Data
transformation, Feature management). Data Visualization: Overview of visualization
techniques for Data Exploratory analysis

Unit IV: Feature Engineering and Optimization 10 Lecture Hours


Feature Extraction, Feature Construction, Feature Subset selection, Feature Learning,
Feature Reduction (Dimensionality Reduction) Case Study involving FE tasks, and
Feature Engineering techniques for text, images, audio, and video. Necessary and
sufficiency conditions for optima; Gradient descent methods; Constrained optimization;
Introduction to non-gradient techniques; Introduction to least squares optimization;
Optimization view of machine learning.

Unit V: Supervised and unsupervised learning 10 Lecture Hours


Introduction to Machine Learning, types, Supervised Learning: Overview, workflow, data
processing, Linear Regression, Logistic Regression, Decision Trees, Random Forest,
Support Vector Machines (SVM), k-Nearest Neighbors (k-NN).

Unsupervised Learning: Overview, clustering algorithms: K-Means Clustering,


Hierarchical Clustering, DBSCAN, Gaussian Mixture Models (GMM),

Dimensionality Reduction: Principal Component Analysis (PCA), t-Distributed Stochastic


Neighbor Embedding (t-SNE)

Association Rule Mining: Apriori Algorithm, FP-Growth Algorithm, Anomaly Detection,


Model Evaluation (Silhouette Score, Inertia, etc.)
Use Cases and Practical Applications

Unit VI: Data Analysis Tool 5 Lecture Hours


Reading and getting data into R, ordered and unordered factors i.e. arrays and matrices
– lists and data frames, reading data from files, probability distributions statistical
models in R - manipulating objects – data distribution.
1. Avrim Blum, John Hopcroft, and Ravindran Kannan, “Foundations of Data Science”,
2018. Available online at: https://www.cs.cornell.edu/jeh/book.pdf.
Total lecture Hours 60
Textbooks
1. G. Strang, “Introduction to Linear Algebra”, 5 th Edition, Wellesley-Cambridge Press,
USA, 2016.
2. D. C. Montgomery, and G. C. Runger, “Applied Statistics and Probability for
Engineers”, 5th Edition, John Wiley & Sons, Inc., NY, USA, 2011.
3. Nina Zumel, and John Mount, “Practical Data Science with R”, Manning
Publications, 2014.
Reference Books
1. Mark Gardener, “Beginning R - The Statistical Programming Language”, John Wiley
& Sons, Inc., 2012.
2. W. N. Venables, D. M. Smith and the R Core Team, “An Introduction to R”, 2013.
Available online at: https://cran.r-project.org/doc/manuals/R-intro.pdf.
3. S. Abiteboul, R. Hull, V. Vianu, “Foundations of Databases”, Addison Wesley, 1995.
4. J. S. Bendat, and A. G. Piersol, “Random Data: Analysis and Measurement
Procedures”, 4th Edition, John Wiley & Sons, Inc., NY, USA, 2010.
5. D. C. Montgomery, and G. C. Runger, “Applied Statistics and Probability for
Engineers”, 5th Edition, John Wiley & Sons, Inc., NY, USA, 2011.
6. Cathy O’Neil, and Rachel Schutt, “Doing Data Science”, O’Reilly Media, 2013.
Modes of Evaluation: Quiz/Assignment/ presentation/ extempore/ Written
Examination
Examination Scheme
Components IA MID SEM End Sem Total
Weightage (%) 50 20 30 100

Course Code Course name L T P C


CSDS2101P Fundamentals of Data Science Lab 0 0 2 1
Total Units to be Covered: 10 Total Contact Hours: 30
Prerequisite(s): Database Management Systems Lab - CSEG2146 Syllabus version: 1.0

Course Objectives

1. Learn to collect, clean, and preprocess data from diverse sources for analysis.
2. Understand core statistical concepts to extract valuable insights from data.
3. Gain a foundational understanding of machine learning algorithms and their
applications.
4. Develop coding skills to perform data analysis and visualization.

Course Outcomes

CO 1. Know the importance of data analytics in relation to various statistical measures.


CO 2. Employ statistical techniques to extract insights from data.
CO 3. Demonstrate proficiency in using R for data analysis.

CO-PO Mapping

Program
Outcomes
PSO1

PSO2

PSO3
PO10

PO11

PO12
PO1

PO2

PO3

PO4

PO5

PO6

PO7

PO8

PO9

Course
Outcome
s
CO 1 1 - - 1 1 - - - - - - - - - 3
CO 2 1 - - 1 1 - - - - - - - - - 3
CO 3 1 2 - 2 1 - - - - - - - - - 3
Average 1 0.67 - 1.3 1 - - - - - - - - - 3

1 – Weakly Mapped (Low) 2 – Moderately Mapped (Medium)

3 – Strongly Mapped (High) “_” means there is no correlation


List of Experiments
Experiment no 1 Conduct basic data exploration by calculating summary statistics,
creating histograms, and generating scatterplots.

Experiment no 2 Learn data cleaning techniques, including handling missing data,


outliers, and data imputation.

Experiment no 3 Perform hypothesis tests, such as t-tests or chi-squared tests, to


make inferences about data.

Experiment no 4 Implement simple linear regression to analyze relationships


between variables and make predictions.

Experiment no 5 Create a variety of visualizations, including bar charts, line graphs,


heatmaps, and box plots.

Experiment no 6 Use clustering algorithms to group similar data points together.

Experiment no 7 Build a random forest model for more advanced classification and
regression tasks.

Experiment no 8 Discover frequent item sets and association rules in transactional


data.

Experiment no 9 Project 1 (Sentiment analysis)

Experiment no 10 Project 2 (Recommendation systems)

Total Lab hours 30

Textbooks
1. G. Strang, “Introduction to Linear Algebra”, 5th Edition, Wellesley-Cambridge Press,
USA, 2016.
2. D. C. Montgomery, and G. C. Runger, “Applied Statistics and Probability for
Engineers”, 5th Edition, John Wiley & Sons, Inc., NY, USA, 2011.
Reference Books
1. Mark Gardener, “Beginning R - The Statistical Programming Language”, John Wiley
& Sons, Inc., 2013.
2. W. N. Venables, D. M. Smith, and the R Core Team, “An Introduction to R”, 2013.
Available online at: https://cran.r-project.org/doc/manuals/R-intro.pdf.

Modes of Evaluation: Quiz/Assignment/ presentation/ extempore/ Written


Examination

Examination Scheme: Continuous Assessment

Components Quiz & Viva Performance & Lab Report


Weightage (%) 50 % 50 %

You might also like