MCA109 : DATA SCIENCE FOUNDATION
L T P Cr
3 0 2 4.0
Course Objective: To elaborate on the basics of data science and provide a foundation for
understanding the challenges and applications.
Introduction to Python: Basic syntax, variables, Random Numbers, Functions.
Data Structures in Python: List, Tuple, Sets, Dictionary, Operations on Data Structures
(Declarations, Iterations, Adding/deleting element, min/max/sorting, merge, select). Use of
Libraries, File Handling (Read, Write, Merge, etc).
Advance Topics in Python: Working with Numpy, Working with Scipy.
Getting Started with Raw Data: Indexing and slicing, Shape manipulation, Empowering
data analysis with pandas: The data structure of pandas, Series, DataFrame, Panel, Inserting
and exporting data: CSV,XLS
Data cleansing: Checking the missing data, Filling the missing data, String operations,
Merging data, Data operations; Aggregation operations: Joins, inner join, left outer join, full
outer join, groupby function.
Application of Inferential Statistics: Various forms of distribution, A normal distribution,
A normal distribution from a binomial distribution, A Poisson distribution, A Bernoulli
distribution, A z-score, A p-value, Type 1 and Type 2 errors, A confidence interval,
Correlation, Z-test vs T-test, The F distribution, The chi-square distribution, Chi-square for
the goodness of fit, The chi-square test of independence, ANOVA.
Plotting and Visualization in Python: Plotting using Matplotlib and Seaborn library
(Histogram, Box Plot, Scatter Plot, Bar Graphs, Line Graph, etc). Binning Visualization,
organizing data and designing dashboards using Tableau.
Histogram: Display the distribution of school enrolments to highlight gaps in access to
quality education, BoxPlot: Compare urban sustainability metrics across cities to identify
disparities and Useful for global comparisons of education outcomes, Scatter Plot, Bar
Graphs: Illustrate regional crime rates to inform peace and justice strategies, Line Graph,
etc., Dashboard for Climate Change.
Data Analysis: Getting to know your data, Data Analysis Pipeline: Data pre-processing-
Attribute values, Attribute transformation, Sampling, Dimensionality Reduction-PCA,
Multidimensional Scaling, Non-linear Methods, Graph-based Semi-Supervised Learning,
Representation Learning Feature subset selection, Distance and Similarity calculation.
Advances in Data Science: Basics of Correlation, Regression, Working with Pandas,
Working Scikit-Learn, Feature Engineering. Emphasis will be placed on applying these tools
in domains such as environmental monitoring, smart cities, and sustainable
development.
Application Development: GUI and Database-based applications to support sustainable
and efficient decision-making systems.
Laboratory Work: To implement general problems in Python; and develop database and
web-based applications that can contribute to sustainability goals, such as resource
optimization and smart solutions.
Recommended Books:
1. Madhavan, S., 2015. Mastering python for data science. Packt Publishing Ltd.
2. Joel Grus, Data Science from Scratch: First Principles with Python, (2nd Ed.),
O’Reilly Media, 2019.
3. John M. Shea, Foundations of Data Science with Python, (1st Ed.), CRC Press, 2021.
4. Martin C. Brown, Python: The Complete Reference, McGraw Hill, 2018.
5. Allen B. Downey, Think Python, O’Reilly, 2016.
6. Jake VanderPlas, Python Data Science Handbook: Essential Tools for Working with
Data, (2nd Ed.), O’Reilly Media, 2022.
Course Learning Outcomes (CLOs): On completion of this course, students will be able to
CLO1 To manage, manipulate, clean, and analyze different types of data.
CLO2 To develop dashboards for real-time data sets.
CLO3 To visualize datasets using various techniques for better understanding.
CLO4 To understand data correlation, reduction, and summarization, aiding in sustainable
data-driven strategies.
CLO5 To apply inferential statistics on real-time dataset.