KEMBAR78
# Syllabus | PDF | Statistics | Histogram
0% found this document useful (0 votes)
20 views2 pages

# Syllabus

The MCA109 Data Science Foundation course provides a comprehensive introduction to data science, focusing on Python programming, data structures, and data analysis techniques. Key topics include data cleansing, inferential statistics, visualization with libraries like Matplotlib and Seaborn, and application development for sustainable decision-making. Upon completion, students will be equipped to manage, analyze, and visualize data, as well as develop dashboards and apply statistical methods to real-time datasets.

Uploaded by

psinglamca25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

# Syllabus

The MCA109 Data Science Foundation course provides a comprehensive introduction to data science, focusing on Python programming, data structures, and data analysis techniques. Key topics include data cleansing, inferential statistics, visualization with libraries like Matplotlib and Seaborn, and application development for sustainable decision-making. Upon completion, students will be equipped to manage, analyze, and visualize data, as well as develop dashboards and apply statistical methods to real-time datasets.

Uploaded by

psinglamca25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

MCA109 : DATA SCIENCE FOUNDATION

L T P Cr
3 0 2 4.0
Course Objective: To elaborate on the basics of data science and provide a foundation for
understanding the challenges and applications.

Introduction to Python: Basic syntax, variables, Random Numbers, Functions.

Data Structures in Python: List, Tuple, Sets, Dictionary, Operations on Data Structures
(Declarations, Iterations, Adding/deleting element, min/max/sorting, merge, select). Use of
Libraries, File Handling (Read, Write, Merge, etc).

Advance Topics in Python: Working with Numpy, Working with Scipy.

Getting Started with Raw Data: Indexing and slicing, Shape manipulation, Empowering
data analysis with pandas: The data structure of pandas, Series, DataFrame, Panel, Inserting
and exporting data: CSV,XLS

Data cleansing: Checking the missing data, Filling the missing data, String operations,
Merging data, Data operations; Aggregation operations: Joins, inner join, left outer join, full
outer join, groupby function.

Application of Inferential Statistics: Various forms of distribution, A normal distribution,


A normal distribution from a binomial distribution, A Poisson distribution, A Bernoulli
distribution, A z-score, A p-value, Type 1 and Type 2 errors, A confidence interval,
Correlation, Z-test vs T-test, The F distribution, The chi-square distribution, Chi-square for
the goodness of fit, The chi-square test of independence, ANOVA.

Plotting and Visualization in Python: Plotting using Matplotlib and Seaborn library
(Histogram, Box Plot, Scatter Plot, Bar Graphs, Line Graph, etc). Binning Visualization,
organizing data and designing dashboards using Tableau.

Histogram: Display the distribution of school enrolments to highlight gaps in access to


quality education, BoxPlot: Compare urban sustainability metrics across cities to identify
disparities and Useful for global comparisons of education outcomes, Scatter Plot, Bar
Graphs: Illustrate regional crime rates to inform peace and justice strategies, Line Graph,
etc., Dashboard for Climate Change.

Data Analysis: Getting to know your data, Data Analysis Pipeline: Data pre-processing-
Attribute values, Attribute transformation, Sampling, Dimensionality Reduction-PCA,
Multidimensional Scaling, Non-linear Methods, Graph-based Semi-Supervised Learning,
Representation Learning Feature subset selection, Distance and Similarity calculation.

Advances in Data Science: Basics of Correlation, Regression, Working with Pandas,


Working Scikit-Learn, Feature Engineering. Emphasis will be placed on applying these tools
in domains such as environmental monitoring, smart cities, and sustainable
development.

Application Development: GUI and Database-based applications to support sustainable


and efficient decision-making systems.

Laboratory Work: To implement general problems in Python; and develop database and
web-based applications that can contribute to sustainability goals, such as resource
optimization and smart solutions.

Recommended Books:

1. Madhavan, S., 2015. Mastering python for data science. Packt Publishing Ltd.
2. Joel Grus, Data Science from Scratch: First Principles with Python, (2nd Ed.),
O’Reilly Media, 2019.
3. John M. Shea, Foundations of Data Science with Python, (1st Ed.), CRC Press, 2021.
4. Martin C. Brown, Python: The Complete Reference, McGraw Hill, 2018.
5. Allen B. Downey, Think Python, O’Reilly, 2016.

6. Jake VanderPlas, Python Data Science Handbook: Essential Tools for Working with
Data, (2nd Ed.), O’Reilly Media, 2022.

Course Learning Outcomes (CLOs): On completion of this course, students will be able to

CLO1 To manage, manipulate, clean, and analyze different types of data.

CLO2 To develop dashboards for real-time data sets.

CLO3 To visualize datasets using various techniques for better understanding.

CLO4 To understand data correlation, reduction, and summarization, aiding in sustainable


data-driven strategies.

CLO5 To apply inferential statistics on real-time dataset.

You might also like