0% found this document useful (0 votes)

3 views101 pages

Unit 5-Data Literacy

The document provides an overview of data literacy, emphasizing the importance of data collection, organization, and analysis in AI. It discusses various types of data, the data collection process, levels of measurement, statistical analysis, and data representation techniques. Additionally, it highlights the significance of data preprocessing and evaluation in modeling, along with multiple-choice questions to assess understanding of the material.

Uploaded by

anaghacomputer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views101 pages

Unit 5-Data Literacy

Uploaded by

anaghacomputer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

Data Literacy – Data

Collection to Data Analysis

Unit 5 - XI
About Data
● can be defined as facts or instructions about some entity (students,
school, sports, business, animals etc.)
● AI is essentially data-driven. ie, Data is the core input for training
and running AI models.
● Data must be collected, organized, and analyzed properly
● Data you collected affect model performance and
decision-making.
● uses AI techniques and
data science to improve
the processes of cleaning,
AI Data Analysis inspecting, and modelling.
● used to extract valuable
information for drawing
meaningful conclusions
and decision-making.
● Structured Data – neatly
organized (tables, rows,
columns)

Types of Data ● Semi-structured Data – partial

organization (JSON, XML)
● Unstructured Data – no
defined format (text, audio,
video)
Data literacy means being able
to find and use data effectively.
This includes skills like
Data Literacy collecting data, organizing it,
checking its quality, analysing
it, understanding the results
and using it ethically.
fj
Data Collection
● It means gathering data from many places.

● This includes websites, devices, surveys, and even offline

sources
● Methods include scraping, capturing, and loading data into
systems
● It’s one of the most time-consuming and challenging parts
of any AI project
The Data Collection Process?

1. First, understand the problem you're solving

2. Decide what data is needed to solve it
3. Find good sources of that data
4. Collect and test the data in small steps
5. Repeat this process as your model improves
What data is to be collected?
● Diverse data to avoid bias and inaccuracy in predictions

Example: A face recognition model should work for different ages, skin
tones, and angles

● Simple AI models (tasks like reading number plates) need less data

● Complex AI models (tasks like detecting diseases from X-rays) need huge
amounts of data

● The more complex your model, the more data it will need.
● Depends on how many
features or variables your
model needs

How much Data

● More features = more data
required
is enough? ● start with small amounts and
improve as needed
● But overall, more data gives
better predictions
Sources of Data ● Primary Data Source
Collection ● Secondary Data Source
Primary Sources are
Primary Data sources which are
Source created to collect the
data for analysis.
Primary Data Sources
Secondary data sources are
where the data is already
stored and ready for use.
Secondary Data Data given in Books,
Source Journals, News Papers,
Websites, Internal
transactional databases etc.
Secondary Data Sources
The main goal of data exploration is to:

● Understand the overall structure and

quality of the data.
EXPLORING ● Identify any errors, missing values,

DATA
or inconsistencies.
● Detect outliers or extreme values
Exploring data is about “getting to
know” the data that may affect results.
● Gain insights that can guide further
data analysis or modeling.
Levels of Measurement
Levels of Measurement
Levels of Measurement
- NOMINAL

● Nominal scales are used for

labeling variables, without any
quantitative value.
● “Nominal” scales could simply
be called “labels.”
Levels of Measurement
- ORDINAL
● With ordinal scales, the order of the values is
what’s important and signiﬁcant, but the
differences between each one is not really known.
● Eg: We can’t say difference between “OK” and
“Unhappy” is the same as the difference between
“Very Happy” and “Happy?”

● Ordinal scales are typically measures of

non-numeric concepts like satisfaction,
happiness, discomfort, etc.
Levels of Measurement
- ORDINAL
● Race positions: 1st, 2nd, 3rd

● School grades: A, B, C, D

● Customer satisfaction: Happy, Okay, Sad

● Education levels: Primary, Secondary, College

● Movie ratings: ⭐ Poor, ⭐⭐ Fair, ⭐⭐⭐ Good

Levels of Measurement
- INTERVAL
● Interval scale data is similar to ordinal data because it has a definite order.

● The key difference is that differences between values can be measured in

interval data.

● No true zero point — zero does not mean “nothing” in interval scale.

Example: Temperature in °C or °F.

● 40° is 20° more than 20° (difference makes sense).

● 0° is not the absolute lowest temperature — negative values exist.

Levels of Measurement
- INTERVAL
● Temperature: 0°C, 10°C, 20°C

● Calendar years: 2000, 2010, 2020

● Time of day: 1:00, 2:00, 3:00 (on a 12-hour clock)

● IQ scores: 90, 100, 110

● Dates on a calendar: Jan 1, Jan 2, Jan 3

Levels of Measurement
- RATIO
● Similar to interval scale, but has a true zero point.

● Ratios can be calculated.

Example: Exam scores – 80 is four times 20.

● Allows all math operations: add, subtract, multiply, divide.

● Real zero means complete absence (e.g., zero weight = no weight).

Example: Weight, height, age.

Levels of Measurement
- RATIO
● Height: 0 cm, 50 cm, 100 cm

● Weight: 0 kg, 10 kg, 20 kg

● Age: 0 years, 5 years, 10 years

● Money: ₹0, ₹100, ₹200

● Distance: 0 km, 5 km, 10 km

Statistical Analysis of Data
● Statistics is the science of data that uses mathematical techniques to
extract meaningful information.

● Statistics involves collecting, organizing, analyzing, interpreting, and

presenting data.

● In AI, statistics turns observations into insights that can be understood

and shared.

● It often works with large datasets, using Central Tendency (mean,

median, mode) to understand and analyze data.
Statistical Analysis of Data
Central Tendency is stated as the summary of a dataset in a
single value that represents the entire distribution of data domain (or
dataset).
Statistical Analysis of Data
Statistical Analysis using Python
What is Mean?
The mean in statistics is calculated by dividing the sum of all
values by the total number of observations in a sample.
What is Mean?
Example -1
The set S = {5,10,15,20,30}
What is Mean?
Program-1
There are 25 students in a class. Their heights are given below.
Write a Python Program to find the mean.

heights → 145, 151, 152, 149, 147, 152, 151,149, 152, 151, 147, 148,
155, 147,152,151, 149,145, 147, 152,146, 148, 150, 152, 151
What is Median?
The median is the middle value of a dataset when the
numbers are arranged in ascending or descending order.
Program-2
There are 25 students in a class. Their heights are given below.
Write a Python Program to find the median.

heights → 145, 151, 152, 149, 147, 152, 151,149, 152, 151, 147, 148,
155, 147,152,151, 149,145, 147, 152,146, 148, 150, 152, 151
What is Mode?
The mode is the value that appears most often in a dataset,
representing the highest bar in a bar chart or histogram.
Sorting might make it easier to spot the most frequent value
in small datasets.
Program-3
Write a program to find the mode
(heights → 145,151, 152, 149, 147, 152, 151,149, 152, 151, 147, 148,
155, 147,152,151, 149, 145, 147, 152,146, 148, 150, 152, 151)
Comparison - Mean, Median, Mode
Measures of Dispersion
Variance and Standard Deviation

Measures of central tendency (mean, median, mode) show

the central value of a dataset, while Measures of Dispersion
(Variance, Standard Deviation) describe how the data is
spread around that center.
Variance and Standard Deviation
Let us understand these two using a diagram:
Measure the height (at the shoulder) of 5 dogs (in millimetres)
Variance and Standard Deviation

Heights: 600 mm, 470 mm, 170 mm, 430 mm, 300 mm

Mean calculation:
Variance and Standard Deviation
Variance and Standard Deviation
Calculate the difference (from mean height), square them, and find the
average. This average is the value of the Variance.

And Standard Deviation is the square root of the variance.

Variance and Standard Deviation

FORMULA - VARIANCE FORMULA - VARIANCE

Important facts about Variance and Standard Deviation
● Small variance → Data points are very close to the mean and to
each other.

● High variance → Data points are widely spread from the mean
and from one another.

● Low standard deviation → Data points are very close to the mean.

● High standard deviation → Data points are spread out over a

large range of values.
Program-4
Write a program to find the variance and standard deviation.
heights → 145,151, 152, 149, 147, 152, 151,149, 152, 151, 147, 148,
155, 147,152,151, 149,145, 147, 152,146, 148, 150, 152, 151
Data Representation
Statistics uses data representation techniques to summarize large
datasets into a compact, meaningful form, allowing important
information to be understood quickly with minimum effort.

Data representation techniques are broadly classified in two ways:

● Non-Graphical technique
● Graphical Technique
Non-Graphical Technique
Eg: Tabular form and Case form

Older methods of data representation, not suitable for large datasets.

Non-graphical techniques are less effective when the goal is to make
decisions based on data analysis.
Graphical Technique
Data visualization is the graphical or pictorial representation of
statistical data using points, lines, charts, and other shapes, making
complex or large datasets easier for the human brain to understand,
as it is in visual format.
Data Visualization can be done in python using the
library Matplotlib.
pyplot is a submodule of Matplotlib that provides a
MATLAB-like interface to the library.
Line Graph
A line graph is a powerful tool used to represent continuous data
along a numbered axis.

It allows us to visualize trends and changes in data points over time.

The line can slope upwards, indicating an increase, or downwards,

signifying a decrease, reflecting the changes in the data over time.
Line chart is plotted in python using the function plot ( ).
Activity -3: Construct a simple line graph to represent the rainfall
data of Kerala as shown in the table below:
Bar Graph
A bar chart or bar graph is a graph that presents categorical data with
rectangular bars with heights or lengths proportional to the values
that they represent.

It is a good way to show comparison between different categories.

Bar chart is plotted in python using the function bar ( ).
Create a bar graph to illustrate the distribution of students from various
schools who attended a seminar on “Deep Learning”. The total number
of students from each school is provided below.
Histogram
Histograms are with vertical rectangles depicting the frequencies of
different value ranges.

They are drawn on a natural scale, making it easy to interpret the

central tendency, such as the mode, of the data.

Histograms can only represent one data distribution per axis.

Histogram is plotted in python using the function hist ( ).

Example -7
Given a dataset containing the heights of girls in class XII, construct a
histogram to visualize the distribution of heights.
141,145,142,147,144,148,141,142,149,144,143,149,146,141, 147, 142, 143

To draw a histogram from this, we first need to organize the data into intervals.
These intervals are also called logical ranges or bins.
Scatterplot
Scatter plots represent relationships between two variables by
plotting data points along both the x and y axes.

They reveal trends, clusters, and relationships within datasets.

A student had a hypothesis for a science project. He believed that the more the
students studied Math, the better their math scores would be. He took a poll in
which he asked students the average number of hours that they studied per
week during a given semester. He then found out the overall percentage that
they received in their Math classes. His data is shown in the table below:

Scatterplot is plotted using the function scatter ( )

Pie Chart
A circular graph divided into slices showing proportions or percentages
of a whole.

Best for visualizing small tables (limit to ~7 categories for clarity).

Pie Chart is plotted using the function pie ( )

A school conducted a survey to find out students’ favorite sports. The

results are shown below:
Write a Python program to create a pie chart showing the distribution
of students’ favorite sports.
MATRICES
● A matrix is a rectangular arrangement of numbers in rows
and columns.
● The numbers are arranged in tabular form as rows and
columns.
● In computer vision (AI), images are represented as
matrices of pixels.
MATRICES
Order of a matrix
● A matrix has m rows and n columns.
● It is called a matrix of order m × n or simply m×n
matrix (m by n matrix)
Operations on Matrices

1. Addition of matrices
Operations on Matrices

2. Difference of matrices
Operations on Matrices

3. Transpose of a matrix
Applications of matrices in AI

• Image Processing
• Recommendation systems use matrices to relate between
users and the purchased or viewed product(s)
• In NLP, vectors (numerical form of words) depict the
distribution of a particular word in a document. Vectors are
one-dimensional matrices.
DATA PREPROCESSING

1. Data Cleaning (Missing Data, Outliers, Inconsistent Data, Duplicate

Data)
2. Data Transformation
3. Data Reduction
4. Data Integration and Normalization
5. Feature Selection
DATA IN MODELLING & EVALUATION
Data Split: Training dataset, Testing dataset
Model Selection: Algorithms chosen based on problem type:
classification, regression, clustering.
Techniques for Evaluation:
● Train-Test Split: Train on training set, evaluate on test set.
● Cross-Validation: Ensures consistent performance across
different subsets.
● Error Analysis: Identifies areas for improvement.
DATA IN MODELLING & EVALUATION

Evaluation Metrics:
● Classification: Accuracy, Precision, Recall, F1-score, ROC curve.
● Regression: MSE, RMSE, MAE, R-squared.
Importance of Data:
● Understanding data helps make informed decisions.
● Data literacy is essential for using AI and technology wisely.
A. Multiple-choice questions

1. Which of the following best defines data literacy?

A) The ability to read and write data
B) The ability to find and use data effectively
C) The ability to analyse data using AI
D) The ability to collect and store data securely
A. Multiple-choice questions

2. What is the purpose of data preprocessing?

A) To make data more complex
B) To make data less accessible
C) To clean and prepare data for analysis
D) To increase the size of the dataset
A. Multiple-choice questions

3. How can missing data be handled in a dataset?

A) By ignoring it
B) By replacing missing values with estimates
C) By deleting rows or columns with missing values
D) By converting missing values to zero
A. Multiple-choice questions

4. Which of the following statements about the quantity of

data needed for machine learning projects is true?
A) More data is always better for good predictions.
B) Small batches of data are sufficient for complex
models.
C) Data quantity depends solely on the number of
features.
D) Data diversity is not essential for model performance.
A. Multiple-choice questions

5. Which of the following is an example of a primary

source of data collection?
A) Web scraping
B) Social media data tracking
C) Surveys
D) Kaggle datasets
A. Multiple-choice questions

6. What method of data collection involves direct

communication with individuals or groups to gather
information?

A) Observations
B) Experiments
C) Interviews
D)Marketing campaigns
A. Multiple-choice questions

7. Which of the following is an example of ratio scale data?

A) Grading students' exam papers as ‘A’, ‘B’, ‘C’, ‘D’, and ‘F’
B) Measuring the temperature in Celsius
C) Rating a meal at a restaurant as ‘unpalatable’,
‘unappetizing’, ‘just okay’, ‘tasty’ and ‘delicious’
D) Recording the weight of a person in kilograms
A. Multiple-choice questions

8. What is the distinguishing feature of ratio scale data?

A) It involves categories without a specific order
B) It has a zero point and allows for ratios to be calculated
C) It involves categories with a strict order but no
measurable differences between categories
D) It has a definite order, but the differences between
categories cannot be measured
A. Multiple-choice questions

9. Which statistical measure is most suitable for data sets

with evenly spread values and no exceptionally high or
low values?
A) Mean
B) Median
C) Mode
D) Variance
A. Multiple-choice questions

10. What is the term used to describe the graphical or

pictorial representation of data?
A) Statistical summary
B) Data organization
C) Data visualization
D) Data interpretation
B. Short answer questions:
1. Explain the concept of data literacy and its importance in today's
digital age.
2. What is data preprocessing?
3.What is data visualization and why is it important?
4. How does a line graph differ from a bar graph?
5. When would you use a scatter plot?
6. What is data?
7. What do you mean by web scraping?
8. If a matrix has 6 elements, what are the possible orders it can
have?
9. Construct a 3x2 matrix where each element is given by aij = i ∗ j
10. Find the transpose of the matrix B =
B. Long answer questions:

1. Discuss the advantages and limitations of using a pie

chart in data visualization. Provide examples to illustrate
your points.
2. Explain the terms mean, median and mode.
3. Explain the four levels of measurement.
4. Given the matrices A and B. Calculate A+ B and B – A.
Python Programs

1. The ages of a group of people in a community are: 25, 28, 30,

35, 40, 45, 50, 55, 60, 65.
Write a program to calculate the mean, median, and mode of the
ages.
2. A company recorded the daily temperatures (in degrees
Celsius) for five consecutive days:
20°C, 22°C, 25°C, 18°C, and 23°C. Determine the variance and
standard deviation of the temperatures.
Python Programs
3. Plot a line chart representing the weekly number of customer
inquiries received by a customer service center:
• Week 1: 150 inquiries
• Week 2: 170 inquiries
• Week 3: 180 inquiries
• Week 4: 200 inquiries
Python Programs
4. Plot a bar chart representing the number of books sold by
different genres in a bookstore:
• Fiction: 120 books
• Mystery: 90 books
• Science Fiction: 80 books
• Romance: 110 books
• Biography: 70 books
Python Programs
5. Visualize the distribution of different types of transportation
used by commuters in a city using a pie chart:
• Car: 40%
• Public Transit: 30%
• Walking: 20%
• Bicycle: 10%

Data Literacy
No ratings yet
Data Literacy
9 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
Advanced Data Analytics: UE23AM343AB1
No ratings yet
Advanced Data Analytics: UE23AM343AB1
19 pages
Datascience Notes Unit-3
No ratings yet
Datascience Notes Unit-3
29 pages
Business Statistics and Computing Complete Ppts
No ratings yet
Business Statistics and Computing Complete Ppts
213 pages
Unit .......
No ratings yet
Unit .......
45 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Data Literacy
No ratings yet
Data Literacy
4 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
Fds Presentation II YEAR
No ratings yet
Fds Presentation II YEAR
21 pages
Descriptive Stat Lec 1
No ratings yet
Descriptive Stat Lec 1
32 pages
Statistics
86% (7)
Statistics
33 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Data Managementmmw
No ratings yet
Data Managementmmw
26 pages
Data Management
No ratings yet
Data Management
36 pages
Module 4
No ratings yet
Module 4
195 pages
Quantitative Data Analysis
100% (3)
Quantitative Data Analysis
27 pages
CHAPTER 4 Data Management
No ratings yet
CHAPTER 4 Data Management
16 pages
DATA LITERACY - IX - Notes
No ratings yet
DATA LITERACY - IX - Notes
5 pages
Descriptive Statistics Basics
No ratings yet
Descriptive Statistics Basics
72 pages
Data Management (1)
No ratings yet
Data Management (1)
46 pages
Research Methods Topic 5 Data Analysis
No ratings yet
Research Methods Topic 5 Data Analysis
59 pages
MATH
No ratings yet
MATH
6 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
100% (1)
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Math
No ratings yet
Math
50 pages
Data Management
No ratings yet
Data Management
43 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
2 pages
AL - I (Unit - I)
No ratings yet
AL - I (Unit - I)
19 pages
Unit 1
No ratings yet
Unit 1
78 pages
Lecture 1-1 Methods of Data Collection
No ratings yet
Lecture 1-1 Methods of Data Collection
30 pages
Statistical Analysis With Software Application
No ratings yet
Statistical Analysis With Software Application
6 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Qunt Data Coding & Analysis
No ratings yet
Qunt Data Coding & Analysis
104 pages
Data Exploration
No ratings yet
Data Exploration
61 pages
Statistics for Teachers
100% (4)
Statistics for Teachers
124 pages
Chapter 4 - Mathematics As A Tool
No ratings yet
Chapter 4 - Mathematics As A Tool
133 pages
DATA 240 - 23 - Lec3 - FA 2024 - Dist
No ratings yet
DATA 240 - 23 - Lec3 - FA 2024 - Dist
50 pages
Statistics
No ratings yet
Statistics
68 pages
Statistics Notes _251013_134305
No ratings yet
Statistics Notes _251013_134305
68 pages
FDS Chp2 Notes
No ratings yet
FDS Chp2 Notes
23 pages
Statistics
No ratings yet
Statistics
63 pages
VM - CH 12 - Statistics
No ratings yet
VM - CH 12 - Statistics
31 pages
What Are Your Results?: Jeffrey Barnes
No ratings yet
What Are Your Results?: Jeffrey Barnes
17 pages
Data Management for Students
No ratings yet
Data Management for Students
11 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Wa0014
No ratings yet
Wa0014
63 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Ch01 ICS422 04
No ratings yet
Ch01 ICS422 04
84 pages
Data Science FDP (2)
No ratings yet
Data Science FDP (2)
38 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Employability - Skills10 20-21 PDF
No ratings yet
Employability - Skills10 20-21 PDF
128 pages
Lists in Python
No ratings yet
Lists in Python
43 pages
IX Project Cycle & Ethics Notes
No ratings yet
IX Project Cycle & Ethics Notes
8 pages
Internet Services Notes
No ratings yet
Internet Services Notes
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
29 pages
ICT Skills
No ratings yet
ICT Skills
14 pages
7 Cs of Communication
No ratings yet
7 Cs of Communication
1 page
Data Literacy II
No ratings yet
Data Literacy II
7 pages
AI Insights for Tech Enthusiasts
No ratings yet
AI Insights for Tech Enthusiasts
2 pages
417 Ai SQP PT-2 X
No ratings yet
417 Ai SQP PT-2 X
2 pages
Pyq Sampling Distribution
No ratings yet
Pyq Sampling Distribution
2 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Semester-Iii Numerical Analysis and Programming (Ma 3103)
No ratings yet
Semester-Iii Numerical Analysis and Programming (Ma 3103)
31 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
35 pages
All Note Sizzle
No ratings yet
All Note Sizzle
220 pages
Champions Reporting Matrix
No ratings yet
Champions Reporting Matrix
3 pages
List of Mining Books PDF
No ratings yet
List of Mining Books PDF
34 pages
2025 Specimen Paper
No ratings yet
2025 Specimen Paper
8 pages
CF Assignment Final Submission
No ratings yet
CF Assignment Final Submission
52 pages
Traineeship Report
No ratings yet
Traineeship Report
9 pages
Chapter 8 Statistics
100% (1)
Chapter 8 Statistics
47 pages
Introduction to Statistics & Probability
No ratings yet
Introduction to Statistics & Probability
2 pages
Session 22,23 - Interval Estimates
No ratings yet
Session 22,23 - Interval Estimates
68 pages
Morans I and Spatial Regression
No ratings yet
Morans I and Spatial Regression
23 pages
2019 JC2 H2 Math SA2 Nanyang JC
No ratings yet
2019 JC2 H2 Math SA2 Nanyang JC
48 pages
Using Matlab To Debug Software Written For A Digital Signal Processor
No ratings yet
Using Matlab To Debug Software Written For A Digital Signal Processor
8 pages
Factor Analysis PDF
100% (1)
Factor Analysis PDF
57 pages
SPM Add Maths Pass Year Question
100% (2)
SPM Add Maths Pass Year Question
62 pages
Green, P. E., & Srinivasan, V. (1990) - Conjoint Analysis in Marketing
No ratings yet
Green, P. E., & Srinivasan, V. (1990) - Conjoint Analysis in Marketing
17 pages
Autonomous Navigation of AGVs in Unknown Cluttered Environments Log MPPI Control Strategy
No ratings yet
Autonomous Navigation of AGVs in Unknown Cluttered Environments Log MPPI Control Strategy
8 pages
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
No ratings yet
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
20 pages
QTM Bba
No ratings yet
QTM Bba
3 pages
Reliability in Medical Research
No ratings yet
Reliability in Medical Research
9 pages
Chapter 3
No ratings yet
Chapter 3
121 pages
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
No ratings yet
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
6 pages
Conditional Expectations Guide
No ratings yet
Conditional Expectations Guide
21 pages
Statistics With GraphPad Prism
No ratings yet
Statistics With GraphPad Prism
53 pages
Chapter 5001
100% (1)
Chapter 5001
32 pages
ANOVA: Testing Population Mean Equality
No ratings yet
ANOVA: Testing Population Mean Equality
13 pages
Keynote 003
No ratings yet
Keynote 003
6 pages

Unit 5-Data Literacy

Uploaded by

Unit 5-Data Literacy

Uploaded by

Data Literacy – Data

Collection to Data Analysis

Types of Data ● Semi-structured Data – partial

● This includes websites, devices, surveys, and even offline

1. First, understand the problem you're solving

How much Data

● Understand the overall structure and

● Nominal scales are used for

● Ordinal scales are typically measures of

● Customer satisfaction: Happy, Okay, Sad

● Education levels: Primary, Secondary, College

● Movie ratings: ⭐ Poor, ⭐⭐ Fair, ⭐⭐⭐ Good

● The key difference is that differences between values can be measured in

Example: Temperature in °C or °F.

● 40° is 20° more than 20° (difference makes sense).

● 0° is not the absolute lowest temperature — negative values exist.

● Calendar years: 2000, 2010, 2020

● Time of day: 1:00, 2:00, 3:00 (on a 12-hour clock)

● IQ scores: 90, 100, 110

● Dates on a calendar: Jan 1, Jan 2, Jan 3

● Ratios can be calculated.

Example: Exam scores – 80 is four times 20.

● Allows all math operations: add, subtract, multiply, divide.

● Real zero means complete absence (e.g., zero weight = no weight).

Example: Weight, height, age.

● Weight: 0 kg, 10 kg, 20 kg

● Age: 0 years, 5 years, 10 years

● Money: ₹0, ₹100, ₹200

● Distance: 0 km, 5 km, 10 km

● Statistics involves collecting, organizing, analyzing, interpreting, and

● In AI, statistics turns observations into insights that can be understood

● It often works with large datasets, using Central Tendency (mean,

Measures of central tendency (mean, median, mode) show

And Standard Deviation is the square root of the variance.

FORMULA - VARIANCE FORMULA - VARIANCE

● High standard deviation → Data points are spread out over a

Data representation techniques are broadly classified in two ways:

Older methods of data representation, not suitable for large datasets.

It allows us to visualize trends and changes in data points over time.

The line can slope upwards, indicating an increase, or downwards,

It is a good way to show comparison between different categories.

They are drawn on a natural scale, making it easy to interpret the

Histograms can only represent one data distribution per axis.

They reveal trends, clusters, and relationships within datasets.

Scatterplot is plotted using the function scatter ( )

Best for visualizing small tables (limit to ~7 categories for clarity).

A school conducted a survey to find out students’ favorite sports. The

1. Data Cleaning (Missing Data, Outliers, Inconsistent Data, Duplicate

1. Which of the following best defines data literacy?

2. What is the purpose of data preprocessing?

3. How can missing data be handled in a dataset?

4. Which of the following statements about the quantity of

5. Which of the following is an example of a primary

6. What method of data collection involves direct

7. Which of the following is an example of ratio scale data?

8. What is the distinguishing feature of ratio scale data?

9. Which statistical measure is most suitable for data sets

10. What is the term used to describe the graphical or

1. Discuss the advantages and limitations of using a pie

1. The ages of a group of people in a community are: 25, 28, 30,

You might also like