KEMBAR78
Types and Levels of Data Explained | PDF | Level Of Measurement | Data Analysis
0% found this document useful (0 votes)
14 views32 pages

Types and Levels of Data Explained

This document outlines key concepts in data analytics, including terminologies, types of data, and levels of measurement. It explains structured, unstructured, and semi-structured data, as well as qualitative and quantitative data, and their respective subgroups. Additionally, it covers statistical parameters such as mean, variance, and standard deviation, emphasizing the importance of normal distribution in statistics.

Uploaded by

ngocchauct2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views32 pages

Types and Levels of Data Explained

This document outlines key concepts in data analytics, including terminologies, types of data, and levels of measurement. It explains structured, unstructured, and semi-structured data, as well as qualitative and quantitative data, and their respective subgroups. Additionally, it covers statistical parameters such as mean, variance, and standard deviation, emphasizing the importance of normal distribution in statistics.

Uploaded by

ngocchauct2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Dealing with Different

Types of Data
Learning Objectives

By the end of this lesson, you will be able to:

List the terminologies used in data analytics

Describe the types of data

Explain the levels of measurement


Terminologies in Data Analytics
Terminologies in Data Analytics

Data Sampling

Observation

Dataset

Prediction
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

● Observation is a single row or a record


of data from the database.

● Any data can be assumed as a set of


observations.
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

Database Table

Age Height Nationality Gender


Variables
Rows

Observation is the unit of analysis on which the measurements are taken.


It is also known as a case, record, pattern, or row.
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

● Data sampling is a statistical analysis


technique used to select, manipulate,
and analyze a representative subset of
data points.

● Data sampling identifies patterns and


trends in the larger data set.
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

● If a sample is randomly selected with 1 or n


observations, then n is the sample size.

● The chart explains the sampling process where a few


people are randomly sampled from a group of
population.

● Data sampling is cost effective and surveys only the


representative sample.

● It enables data scientists, predictive modelers, and


data analysts to produce accurate findings.
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

● Data set is a collection of data or the total


data captured about a particular use case.

● It can hold information such as medical,


insurance, and loan approval records.

● It is not limited to numbers and texts and may


include collections of images or videos.
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

The table represents loan data with attributes such as loan ID, borrower’s gender,
education, employment status, credit history, loan amount, and property details.
Terminologies in Data Analytics

Observation Data Sampling Data Set Prediction

● The goal of prediction is to move from


what has happened to providing the best
assessment of what will happen.

● In the graph, linear prediction technique is


used to predict the number of children within
different education levels.
Types of Data
Types of Data

Structured Data Unstructured Data Semi-Structured Data

It is the data that is processed, It is the type of data that lacks It is the data type containing
stored, and retrieved in a fixed any specific form or structure. both structured and
format. unstructured data.
Example: Email
Example: Employee details, Example: CSV and
job positions, and salaries. JSON documents
Analyzing Unstructured Data

Unstructured information is
About 80% of business data is text-heavy and contains data
unstructured. such as dates, numbers, and
facts.

Internally generated information is Unstructured data is primarily


considered unstructured as the used for BI and analytics but
intelligence doesn’t fit neatly into a not for transaction processing
database. applications.
Analyzing Unstructured Data

Retailers and manufacturers analyze unstructured data to:

● Improve customer relationship management processes

● Enable targeted marketing

● Perform sentiment analysis on product reviews

The line between unstructured and semi-structured data is not clearly defined.
Unstructured data has some level of structure in it.
Qualitative and Quantitative Data
Qualitative and Quantitative Data

Qualitative Data
Data in which classification of objects is
based on attributes and properties.
Example: Softness of skin etc.

Quantitative Data
Data can be measured and expressed
numerically.
Example: Your height and shoe size.
Qualitative and Quantitative Data

Qualitative Data Quantitative Data

● Data collection is unstructured. ● Data collection is structured.

● It asks why. ● It is all about how much or how many.

● It cannot be computed as it is non- ● It is statistical and is about numbers.


statistical.
● It recommends the final course of
● It develops initial understanding and action.
defines the problem.
Subgroups of Qualitative Data

Qualitative
Nominal data Ordinal data
Data
Unordered data to which an order is Ordered data that is assigned to
assigned in relation to other named categories in a ranked fashion
categories
Example: Grade classification like pass or Example: Feedback to a product with 1–5
fail for student's test results. ranking.
Subgroups of Quantitative Data

Discrete data Quantitative Continuous data


Data

It can only take certain values. It can take any value within a
specified range.
Example: The number of students
in a class Example: Share price of a company
Data Levels of Measurement
Data Levels of Measurement
It is a classification that describes the nature of information within the values assigned to variables.

Ratio

Interval

Ordinal

Nominal
Data Levels of Measurement

Nominal Ordinal Interval Ratio

● In nominal level of measurement, numbers in the variable


are used to classify data.

● At this level, words, letters, and alphanumeric symbols can


be used.
M F
● Example: People in female gender category are classified
as F and those in male gender are category classified as M.
Data Levels of Measurement

Nominal Ordinal Interval Ratio

● Ordinal level of measurement depicts ordered


relationship among the variable’s observations.

● It indicates an order of the measurements.

● Example: A student with 100% score is assigned the


first rank, another student with 95% score would be
assigned the second rank, and so on.
Data Levels of Measurement

Nominal Ordinal Interval Ratio

● The interval level of measurement classifies Temperature in centigrade


and orders the measurements.

● It also specifies that the distances between


each interval on the scale are equivalent.

● Example: Temperature in centigrade where the


distance between 80 degrees and 100 degrees is
same as the distance between 1000 degrees and
80°C - 100°C = 1000°C - 1020°C
1020 degrees.
Data Levels of Measurement

Nominal Ordinal Interval Ratio

● In the ratio level of measurement, observations can have a value of zero.

● Although properties of ratio measurement are similar to the interval level of measurement, the zero in scale
makes it different from the other levels of measurement.

Note: The nominal level classifies data, while the ordinal level indicates an order of measurements. The
interval level and the ratio level of measurements provide the same level of measurement.
Normal Distribution of Data
Normal Distribution of Data

● Normal distribution is also known as ● It is the most important probability


Gaussian distribution or Bell curve. distribution in statistics.

● It is a perfectly symmetric bell-


● Most of the natural phenomena and
shaped distribution curve with only
occurrences follow Bell curve.
one peak.

● It is denser at the center and has


● It is continuous and have tails that
equal mean, median, and mode
are asymptotic.
values.
Statistical Parameters
Basic Statistical Parameters
Mean Variance Standard Deviation

● Mean is the average of all data ● Variance is the sum of the squares ● Standard deviation is the square
points for a given set of data. of differences between all numbers root of variance and shows the
and means divided by the number extent to which data varies from
● It is used to derive the central of data points. the mean.
tendency of the data.
● It gives a measure of how the data ● It shows how tightly data points
● It is measured by adding all distributes itself about the mean. are clustered around the mean.
data points and dividing the
sum by the number of data ● It looks at all the data points and ● It is more concrete and gives the
points. then determines their distribution. exact distances from the mean.
Basic Statistical Parameters: Example
Dataset x = {1;2;3;4;5;6}

Mean = (1+2+3+4+5+6)/6 = 3.5

Variance = [(1-3.5)2+(2-3.5) 2+(3-3.5) 2+(4-3.5) 2+(5-3.5) 2+(6-3.5) 2]/6 = 2.917

Standard deviation = √2.917 = 1.708


Key Takeaways
• Structured data, unstructured data, and semi-structured data are the three types of data.

• Nominal, ordinal, interval, and ratio are four data levels of measurement.

• Normal distribution of data is the most important probability distribution in statistics.

• Mean, variance and standard deviation are the basic statistical parameters.

You might also like