Lecture 21
Data Analytics
and
Visualization
Course Code: CS2205
Dr. Rahul Mishra
IIT Patna
2
True/False, Positive/Negative.
A benefit of the six images(dataset) we have is that we know the associated category, animal or
not animal for each image. If we put these into a table with predicted categories as columns and
actual categories as rows we have made ourselves a confusion matrix.
3
Model 4 (Over predict images as animal)
The last model is similar to model 3 but this one correctly classifies all the animals (true positives) but it
mistakes the mop for an animal (false positive) Note that recall now is 100% as the model does not
produce any false negatives. This model is the one that produces the highest F1 score (86%).
4
Exploratory Data Analysis
5
Agenda
1. What is Exploratory Data Analysis?
2. Why EDA is important?
3. Visualization
- Important charts for visualization.
4. Steps involved in EDA:
- Data Sourcing
- Data Cleaning
- Univariate analysis with visualization
- Bivariate analysis with visualization
- Derived Metrics
5. Use Cases
6
7
What is Exploratory Data Analysis
• Exploratory Data Analysis is an approach to analyze the datasets to summarize their main
characteristics in form of visual methods.
• EDA is nothing but a data exploration technique to understand various aspects of the data.
• The main aim of EDA is to obtain confidence in a data to an extent where we are ready to
engage a machine learning model.
• EDA is important to analyze the data; it’s a first step in the data analysis process.
8
• EDA gives a basic idea to understand the data and make sense of the data to figure out the
question you need to ask and find out the best way to manipulate the dataset to get the
answer to your question.
• Exploratory Data Analysis helps us in finding errors, discovering data, mapping out data
structure, and finding anomalies.
• Exploratory Data Analysis is important for business processes because we are preparing
datasets for deep, thorough analysis that will detect business problems.
• EDA helps to build a quick and dirty model, or a baseline model, which can serve as a
comparison against later models that you will build.
9
10
11
12
13
14
15
https://github.com/pik1989/EDA/blob/main/Handling_Missing_Values.ipynb
16