COURSE PLAN
For
Data Mining and Predictive Modelling
(CSET228)
Faculty Name : Dr. Karnika Dwivedi, Dr. Madhuri Gupta, Dr. Anshika
Arora, Dr Dinesh Kumar, Dr Eht e Sham
Course Type : B.Tech Specialization Core-II (Data Science)
Semester and Year : IV Semester (II Year)
L-T-P : 3-0-2
Credits :4
School : SCSET
Course Level : UG
School of Computer Science
Engineering and Technology
Bennett University
Greater Noida, Uttar Pradesh
Page 1 of 7
COURSE CONTEXT
VERSION NO. OF
CURRICULUM/SYLLABUS
SCHOOL SCSET THAT THIS COURSE IS A V1
PART OF
DATE THIS COURSE
DEPARTMENT WILL BE EFFECTIVE Jan–Jun,2024
FROM
VERSION NUMBER OF
DEGREE B.Tech. THIS COURSE 2
COURSE BRIEF
Data Mining and
COURSE TITLE Predictive PRE-REQUISITES NA
Modelling
COURSE CODE CSET228 TOTAL CREDITS 4
COURSE TYPE Specialized Core – II L-T-P FORMAT 3-0-2
Page 2 of 7
LIST OF FACULTY MEMBERS TEACHING THE COURSE:
Name Designation Email Id
(Professor/Associate
Professor/ Assistant
Professor/PHD
Scholar/
Postdoc/....)
Dr Karnika Dwivedi Assistant Professor Karnika.dwivedi@bennett.edu.in
Dr Anshika Arora Assistant Professor
Dr Madhuri Gupta Assistant Professor
Dr Dinesh Kumar Associate Professor
Dr Eht e Sham Assistant Professor
FACULTY TIME TABLE:
Dr Karnika Dwivedi: Course Coordinator
Dr Anshika Arora
Dr Madhuri Gupta
Page 3 of 7
Dr Dinesh Kumar
Dr Eht e Sham
COURSE SUMMARY
This course exposes multiple techniques of understanding and analyzing the data from a
mathematical point of view. In addition, they will also use multiple predictive models to
analyse the future trend. This will be done statistically.
COURSE-SPECIFIC LEARNING OUTCOMES (CO)
By the end of this program, students should have the following knowledge, skills and values:
CO1: To articulate data preparation for data mining and analyzing based on pre-
processingtechniques.
CO2: To examine predictive analysis in various use cases.
Page 4 of 7
CO3: To make use of exploratory data analysis to gain insights and prepare data for
predictive modelling.
CO – PO /PSO Mapping
COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO POS
→ 2 3
POs
CO1 H H H M H M H M H M M
CO2 H M M H M H H M H M
CO3 H H M M H M H H H M
H: High / M: Medium /L: Low
SYLLABUS
Module 1 (11 hours)
Purpose of Data mining, Procedures of Data Mining, Functionality of Data Mining, Knowledge
data discovery process, Data, and attribute type, Properties of data, Discrete and continuous
attributes, Dataset types, Data quality measurement, Noise Analysis and its importance,
Techniques of Data pre-processing, Aggregation, Sampling, Curse of dimensionality,
Dimensionality reduction, Feature selection and generation, Discretization and vectorization,
Binarization, Attribute transformation correlation, Association rule mining, Apriori algorithm,
Rule generation, Pattern Mining in: Multilevel, Multidimensional Space Pattern Mining.
Module 2 (7 hours)
Rule-based reasoning, Memory-based reasoning, measuring data similarity, Similarity Metrics:
Distance-based measure, Information based measures, Set similarity measure, Jaccard Index,
Sorenson Dice Coefficient, Model Selection Problem, Error Analysis, Case study, Startups in
DataAnalysis.
Module 3 (10 hours)
Outlier analysis in classification and clustering, Probabilistic models for clustering, Clustering
high dimensional data: Subspace clustering, Projection Based clustering, Exploratory data
analysis, Data summarization and visualization, Dataset exploration, Data Exploration Tools,
Interactive Data Exploration, Predictive models, Design Principles, Parametric Models, Non-
Parametric Models, ANOVA, Regression Analysis, Frequent Pattern Mining, Mining Closed and
Max Patterns.
Module 4 (14 hours)
Linear discriminant analysis, Fisher discriminant analysis, Time series Model: ARMA, ARIMA,
Page 5 of 7
ARFIMA, Factor Analysis, Uncertainty quantification, Forward uncertainty propagation, Inverse
uncertainty quantification, Non-Negative Matrix Factorization, Sequential Matrix Factorization.
Exact Matrix Factorization, Expert Lecture from Industry, Recommendation System and
Collaborative Filtering, Multidimensional Scaling, Mining Textual Data, Temporal mining,
Spatial mining, Visual and audio data mining, Ubiquitous and invisible data mining- Privacy,
Security, Social Impacts of data mining.
STUDIO WORK / LABORATORY EXPERIMENTS:
Data pre-processing and vectorization. Quality analysis of data. Feature selection and Ranking.
Association rule mining and implementation of the Apriori algorithm. Data Similarity and set
similarity. Error analysis and model selection. Frequent pattern mining and regression.
Discriminant Analysis. Factor Analysis. Matrix Factorization. Recommendation System.
TEXTBOOKS/LEARNING RESOURCES:
1. Bruce Ratner, Statistical and Machine-Learning Data Mining:Techniques for
Better PredictiveModeling and Analysis (3rd ed.), Chapman and Hall/CRC, 2017.
ISBN 978-1498797603.
2. Dursun Delen, Predictive Analytics (1st ed.), Knime, 2020. ISBN 9780136738516.
REFERNCE BOOKS/LEARNING RESOURCES:
1. Mohammed J. Zaki and Wagner Meira, Jr, Data Minimg and MachineLearning (1st ed.),
Cambridge University Press, 2020. ISBN 9781108473989
TEACHING-LEARNING STRATEGIES
The course will be taught using a combination of the best practices of teaching-learning.
Multiple environments will be used to enhance the outcomes such as seminar, self-
learning, MOOCs,group discussions and ICT based tools for class participation along with
the classroom sessions. The teaching pedagogy being followed includes more exposure
to hands-on experiment andpractical implementations done in the lab sessions. To match
with the latest trend in academics, case study, advanced topics and research oriented topics
are covered to lay down the foundation and develop the interest in the students leading to
further exploration of the related topics. To make the students aware of the industry trends,
one session of expert lecture will be organized to provide a platform to the students for
understanding the relevant industry needs.
EVALUATION POLICY
Page 6 of 7
Components of Course Evaluation Percentage Distribution
Mid-Term 20
End-Term 40
Course Certification and Viva 10
Lab Continuous Evaluations 15
End-Term Lab Examination/ Hackathon 15
To be Filled each Semester
Probable Case Studies:
1) Advanced Research Topics: Time Series Analysis, Prediction
2) Startups to be discussed: Uber, Fractal Analytics
3) Assessment Components Details: As given in evaluation policy
4) Software required: Anaconda, google colab, pycharm, IDLE, VScode (anyone)
5) Hardware required: NA
Relevant MOOC Courses being Referred:
Specialization: https://www.coursera.org/specializations/data-mining
Note on specialization course requirements:
The specialization program offers a selection of 6 courses, each varying in duration:
• Some courses are 30 hours or more.
• Some courses are 15–16 hours.
To fulfil the specialization requirements:
1. If a student opts for a 30-hour or longer course, they are required to complete only one
certification for that course.
2. If a student opts for a course of 15–16 hours, they must complete two certifications, each of 15
or 16 hours, to meet the requirement.
Page 7 of 7