CS352 Data Analysis Lab
Course Objectives:
The main objectives of the course are to:
1. Introduce Python libraries used for data manipulation and visualization
2. Create awareness on data cleaning, wrangling and various operations on data
3. Impart knowledge on visualizing the data using various plots
Course Outcomes:
On successful completion of the course, students will be able to:
1. Perform operations on data using basic concepts of Numpy and Pandas
2. Perform Data cleaning and Data wrangling operations
3. Visualize data using the tool Matplotlib
4. Perform operations on aggregations and time series data
Course Content:
UNIT-I
NumPy Basics: Arrays and Vectorized Computation :The NumPy ndarray,
Universal Functions, Array-Oriented Programming with Arrays, File Input and
Output with Arrays, Linear Algebra, Pseudorandom Number Generation, Example:
Random Walks
Pandas Data Structure: Introduction to pandas Data Structure, Essential
Functionality, Summarizing and Computing Descriptive Statistics
UNIT-II
Data Loading, Storage, and File Formats: Reading and Writing Data in Text
Format, Binary Data Formats, Interacting with Web APIs, Interacting with
Databases.
Data Cleaning and Preparation: Handling Missing Data, Data Transformation,
String Manipulation.
UNIT-III
Data Wrangling: Join, Combine, and Reshape: Hierarchical Indexing,
Combining and Merging Datasets, Reshaping and Pivoting
Plotting and Visualization: A Brief matplotlib API Primer, Plotting with pandas
and seaborn.
UNIT-IV
Data Aggregation and Group Operations: Group By Mechanics, Data
Aggregation, Apply: General split-apply-combine, Pivot Tables and Cross-
Tabulation.
Time Series: Date and Time Data Types and Tools, Time Series Basics, Date
Ranges, Frequencies, and Shifting, Time Zone Handling, Periods and Period
Arithmetic, Resampling and Frequency Conversion, Moving Window Functions
Learning Resources:
Textbook(s):
1. Wes McKinney, Python for Data Analysis - Data Wrangling with Pandas,
NumPy, and IPython 2nd Edition. O’Reilly/SPD
References:
1. Jake VanderPlas, Python Data Science Handbook Essential Tools for
Working with Data. O’Reilly/SPD
2. David Taieb ,”Data Analysis with Python: A Modern Approach “ 1st
Edition, Packt Publishing
List of Experiments:
1. Numpy Array operations
2. Iris Dataset
3. Pandas Series
4. Pandas Dataframes
5. Canada Pizza Price Prediction
6. Mobile Phone Price Data set
7. National Universities Rankings.
8. Adidas Sales Dataset
9. Movies Dataset.
10. Avocado Prices