AD1301A
INTRODUCTION TO DATA SCIENCE LTPC
3003
COURSE OBJECTIVES
● Will gain knowledge in the basic concepts of Data Analysis
● To acquire skills in data preparatory and preprocessing steps
● To understand the mathematical skills in statistics
● To learn the tools and packages in Python for data science
● To acquire knowledge in data interpretation and visualization techniques
UNIT I INTRODUCTION 9
Need for data science – benefits and uses – facets of data – data science process – setting
the research goal– retrieving data – cleansing, integrating, and transforming data –
exploratory data analysis – build the models – presenting and building applications.
UNIT II DESCRIBING DATA 10
Frequency distributions –Outliers –relative frequency distributions –cumulative frequency
distributions –frequency distributions for nominal data –interpreting distributions –graphs –
averages -normal distributions–z scores –normal curve problems –finding proportions –
finding scores –more about z–interpretation of r2–multiple regression equations –regression
toward the mean- statistical metrics with python.
UNIT III INTRODUCTION TO NUMPY 8
Data types in Python -basics of Numpy arrays - computations on Numpy Arrays-universal
functions -aggregations: min, max and Everything in between-computation on arrays:
broadcasting - comparisons, masks, and Boolean logic - fancy indexing -sorting values in
Numpy array-fast sorting-sorting along rows or columns-partial sorts-K nearest neighbors-
Numpy’s structured arrays.
UNIT IV DATA MANIPULATION WITH PANDAS 9
Pandas objects - data indexing and selection - operating on data in pandas -handling
missing data -hierarchical indexing - combining datasets: concat and append - combining
datasets: merge and join aggregation and grouping- pivot tables-vectorized string operations
- working with time Series – high performance pandas: eval()and query().
UNIT V PYTHON FOR DATA VISUALIZATION 9
Visualization with matplotlib – line plots – scatter plots – visualizing errors – density and
contour plots –histograms, binnings, and density –three dimensional plotting – geographic
data – data analysis using stat models and seaborn – graph plotting using Plotly – interactive
data visualization using Bokeh.
TOTAL: 45 PERIODS
COURSE OUTCOMES
At the end of the course Students will be able to:
● Apply the skills of data inspecting and cleansing.
● Determine the relationship between data dependencies using statistics
● Represent the useful information using mathematical skills
● Can handle data using primary tools used for data science in Python
● Can apply the knowledge for data describing and visualization using tools.
TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”,
Manning Publications, 2016. (first two chapters for Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
(Chapters 1–7 for Units II)
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Parts of chapters 2–
4 for Units III,IV and V)
REFERENCES
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press,
2014.
COURSE COORDINATOR HOD/AI & DS