MADHAV INSTITUTE OF TECHNOLOGY & SCIENCE, GWALIOR
(Deemed to be University)
(Declared Under Distinct Category by Ministry of Education, Government of India)
NAAC Accredited with A++ Grade
Department of Computer Science and Engineering
DATA SCIENCE
150511/290501
DC
COURSE OBJECTIVES
● To provide the fundamental knowledge of Data Sciences, along with essential Python
programming skills..
● Apply data manipulation, statistical analysis, and visualization techniques using Python libraries
like NumPy and pandas.
● Develop, implement, and evaluate machine learning models while using statistical methods to
derive insights and validate results.
—--------------------------------------------------------------------------------------------------------------------
Unit – I:
Introduction to Data Science: Introduction, Definition, applications of Data Science, Impact of
Data Science, Data Analytics Life Cycle, role of Data Scientist.
Basics of Python: Essential Python libraries, Python Introduction- Features, Identifiers, Reserved
words, Indentation, Comments, Built-in Data types and their Methods: Strings, List, Tuples,
Dictionary, Set, Type Conversion- Operators. Decision Making: Looping-Loop Control statement,
Math and Random number functions. User defined functions.
Vectorized Computation: The NumPy ndarray- Creating ndarrays- Data Types for ndarrays-
Arithmetic with NumPy Arrays- Basic Indexing and Slicing.
Unit-II
Data Analysis (with Pandas): Series, DataFrame, Essential Functionality: Dropping Entries,
Indexing, Selection, and Filtering- Function Application and Mapping- Sorting and Ranking.
Summarizing and Computing Descriptive Statistics – Mean, Standard Deviation, Skewness and
Kurtosis. Unique Values, Value Counts, and Membership. Reading and Writing Data in Text
Format.
Unit-III
Exploratory Data Analysis and Visualisation: Handling Missing Data, Data Transformation:
Removing Duplicates, Transforming Data Using a Function or Mapping, Replacing Values,
Detecting and Filtering Outliers, Functions in pandas. Plotting with pandas: Line Plots, Bar Plots,
Histograms and Density Plots, Scatter or Point Plots.
Unit-IV
Introduction to Machine Learning: Types of Learning, Linear Regression- Simple Linear
Regression, Implementation, plotting and fitting regression line, Logistic Regression, K-Nearest
Neighbors (KNN), K-Means Clustering.
MADHAV INSTITUTE OF TECHNOLOGY & SCIENCE, GWALIOR
(Deemed to be University)
(Declared Under Distinct Category by Ministry of Education, Government of India)
NAAC Accredited with A++ Grade
Unit-V
Model Evaluation Metrics: Accuracy, Precision, Recall, F1-Score
Hypothesis Testing: Mean and Variance Tests, p-value, Errors, Z-Test, t-Test, Paired t-Test, and
F-Test, Analysis of Variance (ANOVA) and Contingency Table Analysis
-------------------------------------------------------------------------------------------------------------------------------
RECOMMENDED BOOKS
1. Cathy O’Neil and Rachel Schutt , “Doing Data Science”, O'Reilly, 2015.
2. David Dietrich, Barry Heller, Beibei Yang, “Data Science and Big data Analytics”, EMC 2013
3. Artificial Intelligence: A Modern Approach by Stuart J. Russell and Peter Norvig, Prentice Hall.
4. Pattern Recognition and Machine Learning, Christopher M. Bishop
5. James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
COURSE OUTCOMES
After completion of this course, the students would be able to:
CO1: Analyze Data Science concepts and apply Python programming for data tasks, including
data manipulation with NumPy.
CO2: Analysis of the data for applying various statistical modeling approaches.
CO3: Develop expertise in managing missing data and assessing the impact of visualizations on
data insight communication.
CO4: Design and implement machine learning algorithms and assess model performance.
CO5: Develop statistical tests and evaluate machine learning models.
CO-PO Mapping (1 - Slightly; 2 - Moderately; 3 – Substantially)
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO1 PO1 PO1 PSO PSO
0 1 2 1 2
CO1 3 2 1 1 3 2 - - 1 2 1 1 3 2
CO2 3 3 2 2 3 2 - 1 1 2 2 1 3 3
CO3 2 2 2 1 3 2 - 1 1 3 3 2 3 2
CO4 3 3 3 2 3 3 - 1 1 2 3 2 3 3
CO5 3 3 2 2 3 3 1 1 1 2 2 2 3 3
MADHAV INSTITUTE OF TECHNOLOGY & SCIENCE, GWALIOR
(Deemed to be University)
(Declared Under Distinct Category by Ministry of Education, Government of India)
NAAC Accredited with A++ Grade
DATA SCIENCE
150511/290501
(DC)
List of Experiments
1. Perform Creation, indexing, slicing, concatenation and repetition operations on Python
built-in data types: Strings, List, Tuples, Dictionary, Set
2. Solve problems using decision and looping statements.
3. Apply Python built-in data types: Strings, List, Tuples, Dictionary, Set and their methods to
solve any given problem
4. Handle numerical operations using math and random number functions.
5. Manipulation of NumPy arrays- Indexing, Slicing, Reshaping, Joining and Splitting.
6. Computation on NumPy arrays using Universal Functions and Mathematical methods.
7. Import a CSV file and perform various Statistical and Comparison operations on
rows/columns.
8. Create Pandas Series and DataFrame from various inputs.
9. Import any CSV file to Pandas DataFrame and perform the following:
1. Visualize the first and last 10 records
2. Get the shape, index and column details
3. Select/Delete the records(rows)/columns based on conditions.
4. Perform ranking and sorting operations.
5. Do required statistical operations on the given columns.
6. Find the count and uniqueness of the given categorical values.
7. Rename single/multiple columns.
10.Import any CSV file to Pandas DataFrame and perform the following:
1. Handle missing data by detecting and dropping/ filling missing values.
2. Transform data using different methods.
3. Detect and filter outliers.
4. Perform Vectorized String operations on Pandas Series.
5. Visualize data using Line Plots, Bar Plots, Histograms, Density Plots and Scatter Plots.
11.Use the scikit-learn package in python to implement the regression model and its related
methods.
MADHAV INSTITUTE OF TECHNOLOGY & SCIENCE, GWALIOR
(Deemed to be University)
(Declared Under Distinct Category by Ministry of Education, Government of India)
NAAC Accredited with A++ Grade
Course Outcomes (COs) for the Data Science lab:
CO1: Apply fundamental Python programming constructs such as data types, control structures,
and functions to design ethical and efficient solutions for real-life problems.
CO2: Analyze and process structured and unstructured data using Python libraries like NumPy and
Pandas to derive meaningful insights while considering societal relevance and responsible data
handling.
CO3: Develop real world data science applications using Python
CO-PO Mapping (1 - Slightly; 2 - Moderately; 3 – Substantially)
COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 3 3 2 2 2 2 2 3 2
CO2 3 3 2 2 2 2 2 2 3 3
CO3 3 2 3 2 2 2 3 3 2 2 3 3
MADHAV INSTITUTE OF TECHNOLOGY & SCIENCE, GWALIOR
(Deemed to be University)
(Declared Under Distinct Category by Ministry of Education, Government of India)
NAAC Accredited with A++ Grade
DATA SCIENCE
150511/290501
(DC)
list of skill-based project (Sample list)
● Exploratory Data Analysis (EDA): Perform an in-depth analysis of a dataset, including data
cleaning, visualization, and statistical analysis to gain insights and understand the
underlying patterns and relationships.
● Predictive Modeling: Build a machine learning model to predict a specific outcome or
target variable based on a given dataset. This could include classification, regression, or
time series forecasting tasks.
● Natural Language Processing (NLP): Develop a text classification or sentiment analysis
model using techniques such as tokenization, word embeddings, and recurrent neural
networks (RNNs) to analyze and understand text data.
● Image Recognition: Create an image recognition system using convolutional neural
networks (CNNs) to classify or identify objects, faces, or patterns in images.
● Recommendation System: Build a recommendation engine that suggests personalized
recommendations to users based on their preferences and behavior, using collaborative
filtering or content-based filtering techniques.
● Clustering Analysis: Implement clustering algorithms such as k-means, hierarchical
clustering, or DBSCAN to group similar data points together and discover hidden patterns
or segments within a dataset.
● Time Series Analysis: Analyze time-dependent data, such as stock prices or weather data,
using techniques like autoregressive integrated moving average (ARIMA), exponential
smoothing, or recurrent neural networks (RNNs).
● Anomaly Detection: Develop an anomaly detection system that can identify unusual or
suspicious patterns in data, which can be useful for fraud detection, network intrusion
detection, or outlier detection.
● Social Media Sentiment Analysis: Use data from social media platforms to analyze public
sentiment towards specific topics, brands, or events using natural language processing
techniques and sentiment analysis algorithms.
● Data Visualization Dashboard: Create an interactive dashboard using libraries like Plotly or
Dash to visualize and explore data, providing users with an intuitive interface to interact
with and gain insights from the data.
Please Note: Each project has to be submitted by a group of 1 or 2 students, and each group will
be assigned only one project.
***********