KEMBAR78
Classification and Organization of Data | PDF | Level Of Measurement | Histogram
0% found this document useful (0 votes)
2K views12 pages

Classification and Organization of Data

This document discusses classification and organization of data. It defines data classification as organizing data elements according to predefined criteria to make data easier to locate and retrieve, which is important for risk management, security, and compliance. It also discusses organizing data into categories for easy retrieval, storage, and future use. Finally, it describes the differences between categorical and numerical data, including their definitions, examples, types, characteristics, uses, and compatibility with analysis methods.

Uploaded by

Kleinia Uyson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views12 pages

Classification and Organization of Data

This document discusses classification and organization of data. It defines data classification as organizing data elements according to predefined criteria to make data easier to locate and retrieve, which is important for risk management, security, and compliance. It also discusses organizing data into categories for easy retrieval, storage, and future use. Finally, it describes the differences between categorical and numerical data, including their definitions, examples, types, characteristics, uses, and compatibility with analysis methods.

Uploaded by

Kleinia Uyson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

What is Classification and Organization of Data?

 Data classification is the practice of organizing and categorizing data elements according to pre-
defined criteria. Classification makes data easier to locate and retrieve. Classifying data is
instrumental in promoting risk management, security, and regulatory compliance.
 Data classification is the process of organizing data into categories that make it easy to retrieve,
sort and store for future use.
 A well-planned data classification system makes essential data easy to find and retrieve. This can
be of particular importance for risk management, legal discovery and regulatory compliance.
 Classification of data brings order to raw data. We can classify a bulk of data based on their need
or purpose.
 Data organization is the way to arrange the raw data in an understandable order. Organizing data
include classification, frequency distribution table, picture representation, graphical
representation, etc.
 Data organization helps us to arrange the data in order that we can easily read and work. It is
difficult to work or do any analyses on raw data. Hence, we need to organize the data to
represent them in a proper way. Let us understand with the help of an example.

Importance of organizing data:

 decreases the time-consuming of searching for data.


 reducing data loss and reduces errors.
 understand why the data was collected and what the proper use of it is.
 gives you the validity of the work undertaken.

Categorical and Numerical Data:

 Categorical data refers to a data type that can be stored and identified based on the names or
labels given to them. A process called matching is done, to draw out the similarities or relations
between the data and then they are grouped accordingly.
 The data collected in the categorical form is also known as qualitative data. Each dataset can be
grouped and labelled depending on their matching qualities, under only one category. This
makes the categories mutual exclusive.

Two subtypes of categorical data:

Nominal data – this is also called naming data. This is a type that names or labels the data and
its characteristics are similar to a noun. Example: person’s name, gender, school name.

Questions to gather nominal data look like:

What is your name?

What is your pet’s name?

What is your gender?

Ordinal data – this includes data or elements of data that is ranked, ordered or used on a rating
scale. You can count and order ordinal data but it doesn’t allow you to measure it.
Example: seminar attendants are asked to rate their seminar experience on a scale of 1-5.
Against each number, there will be options that will rate their satisfaction like “very good, good,
average, bad, and very bad”.

 Numerical data refers to the data that is in the form of numbers, and not in any language or
descriptive form. Often referred to as quantitative data, numerical data is collected in number
form and stands different from any form of number data types due to its ability to be statistically
and arithmetically calculated.

Two subtypes of numerical data:

Discrete data – Discrete data is used to represent countable items. It can take both numerical
and categorical forms and group them into a list. This list can be finite or infinite too.

Discrete data basically takes countable numbers like 1, 2, 3, 4, 5, and so on. In the case of infinity,
these numbers will keep going on.

Example: counting sugar cubes from a jar is finite countable. But counting sugar cubes from all
over the world is infinite countable.

Continuous data – As the name says, this form has data in the form of intervals. Or simply said
ranges. Continuous numerical data represent measurements and their intervals fall on a
number line. Hence, it doesn’t involve taking counts of the items.

Example: in a school exam, students who scored 80%-100% come under distinction, 60%-80%
have first-class and below 60% are second class.

Two Categories of Continuous Data:

Interval data – interval data type refers to data that can be measured only along a scale
at equal distances from each other. The numerical values in this data type can only
undergo add and subtract operations.

Example: body temperature can be measured in degree Celsius and degree Fahrenheit
and neither of them can be 0.

Ratio data – unlike interval data, ratio data has zero points. Being similar to interval data,
zero point is the only difference they have.

Example: in the body temperature, the zero-point temperature can be measured in


Kelvin.
DIFFERENCES BETWEEN CATEGORICAL AND NUMERICAL DATA

Features Categorical data Numerical data

Definition Categorical data refers to a data Numerical data refers to the


type that can be stored and data that is in the form of
identified based on the names numbers, and not in any
or labels given to them. language or descriptive form.

Alias Also known as qualitative data Also known as quantitative


as it qualifies data before data as it represents
classifying it. quantitative values to perform
arithmetic operations on
them.

Examples What is your gender? What is your test score out of


20?
 Male
 Below 5
 Female
 5-10
 Other
 10-15

 15-20

 20

Types Nominal data and Ordinal data. Discrete data and Continuous
data.
Characteristics  No order scales  Has an ordered scale

 Natural language description  Not use of natural


language description
 Can take numerical values
but with qualitative  Takes numeric values with
properties numeric qualities

 Can be visualized using bar  Can be visualized using bar


charts and pie charts charts and pie charts

User-friendly Can include long surveys and has Survey interaction is easy and
design a chance of pushing short, hence fewer survey
respondents away. abandonment issues.

Data collection Nominal data: open-ended Mostly collected through


method questions Ordinal data: multiple- multiple-choice questions and
choice questions sometimes through open-
ended questions.

Data collection Questionnaires, surveys, and Questionnaires, surveys,


tools interviews interviews, focus groups and
observations

Analysis and Median and mode Descriptive and inferential


interpretation Ex: univariate statistics, bivariate statistics Ex: measures of
statistics, regression analysis central

tendency, turf analysis, text


analysis, conjoint analysis, trend
analysis

Uses Used when a study requires Used for statistical calculations


respondents’ personal as a result of the potential
information, opinions and performance of arithmetic
experiences. Commonly used in operations
business research
Compatibility It is not compatible with most It is compatible with most
statistical analysis methods; statistical calculation
hence researchers avoid using it methods.
most of the times

Visualization Can be visualized using only bar Can be visualized using bar
graphs and pie charts. graphs, pie charts as well as
scatter plots.

Structure Is known as unstructured or It is structured data and can be


semi-structured data It can use quickly organized and made
indexing methods to structure sense of
data like Google, Bing, etc.

Frequency Distributions

 The frequency of a value is the number of times it occurs in a dataset. A frequency distribution
is the pattern of frequencies of a variable. It’s the number of times each possible value of a
variable occurs in a dataset.
 A frequency distribution is a representation, either in a graphical or tabular format, that displays
the number of observations within a given interval. The frequency is how often a value occurs in
an interval while the distribution is the pattern of frequency of the variable.
 A frequency distribution in statistics is a representation that displays the number of observations
within a given interval.
 The representation of a frequency distribution can be graphical or tabular so that it is easier to
understand.
 Frequency distributions are particularly useful for normal distributions, which show the
observations of probabilities divided among standard deviations.

Constructing a Frequency Table

A frequency table is an effective way to summarize or organize a dataset. It’s usually composed of two
columns:

 The values or class intervals


 Their frequencies

How to make a frequency table:

1. Create a table with two columns and as many rows as there are values of the variable. Label the
first column using the variable name and label the second column “Frequency.” Enter the values
in the first column.
For ordinal variables, the values should be ordered from smallest to largest in the table rows.

For nominal variables, the values can be in any order in the table. You may wish to order them
alphabetically or in some other logical order.

2. Count the frequencies. The frequencies are the number of times each value occurs. Enter the
frequencies in the second column of the table beside their corresponding values.

Especially if your dataset is large, it may help to count the frequencies by tallying. Add a third
column called “Tally.” As you read the observations, make a tick mark in the appropriate row
of the tally column for each observation. Count the tally marks to determine the frequency.

Example:

A gardener set up a bird feeder in their backyard. To help them decide how much and what type of
birdseed to buy, they decide to record the bird species that visit their feeder. Over the course of one
morning, the following birds visit their feeder:

Histograms

 A frequency histogram is a graphical version of a frequency distribution where the width and
position of rectangles are used to indicate the various classes, with the heights of those
rectangles indicating the frequency with which data fell into the associated class.
 A histogram is a bar graph which shows frequency distribution.

Use a histogram when:

 The data are numerical


 You want to see the shape of the data’s distribution, especially when determining whether the
output of a process is distributed approximately normally
 Seeing whether a process change has occurred from one time period to another
 Determining whether the outputs of two or more processes are different
 You wish to communicate the distribution of data quickly and easily to others

Making a Histogram Using a Frequency Distribution Table

To make a histogram, follow these steps:

1. On the vertical axis, place frequencies. Label this axis "Frequency".


2. On the horizontal axis, place the lower value of each interval. Label this axis with the type of
data shown (price of birthday cards, etc.)
3. Draw a bar extending from the lower value of each interval to the lower value of the next
interval. The height of each bar should be equal to the frequency of its corresponding interval.

Example: Make a histogram showing the frequency distribution of the price of birthday cards.

The previous example shows that more birthday cards cost between $1.00 and $1.49 than any other
price, because the bar which corresponds to those values is highest. We can also see that twice as many
cards cost between $3.00 - $3.49 as cost between $3.50 - $3.99, because the bar which corresponds to
$3.00 - $3.49 is twice as high as the bar which corresponds to $3.50 - $3.99.

Data Visualization

 Data visualization is the representation of data through use of common graphics, such as charts,
plots, infographics, and even animations. These visual displays of information communicate
complex data relationships and data-driven insights in a way that is easy to understand.
 Data visualization tools provide an accessible way to see and understand trends, outliers, and
patterns in data.

Advantages of data visualization include:


 Easily sharing information.
 Interactively explore opportunities.
 Visualize patterns and relationships.

Bar Charts and Pie Charts

 Bar charts and pie charts are used extensively in mathematics to demonstrate the statistical
data. Bar charts represent information using a sequence of bars while pie charts represent
information in circular form.
 Bar charts represent information using a sequence of bars spanning two axes. The x-axis (the
horizontal) categorizes the data into a group, with one bar representing each group. On the Y-
axis, the exact numerical value of the given group is described.
 A pie chart shows data as circular bars, with each slice representing a portion of the data. A pie
chart is a visual representation of data in the form of numerical and categorical variables.
 A pie chart shows how some total amount is divided among distinct categories as a circle (the
namesake pie) divided into radial slices
 Bar charts usually represent categorical data and consist of two axes. One axis consists of bars
representing different categories, while the other axis represents discrete values.
 The two most common types of bar graphs are vertical bar graphs and horizontal bar graphs. A
vertical bar graph consists of bars along the x-axis, whereas in a horizontal bar graph, the y-axis
consists of horizontal bars.

When to use Bar Charts?

 The important point to note about bar charts is their bar length or height—the greater their
length or height, the greater their value.
 Bar charts should be used when you are showing segments of information.
 Bar charts are useful to compare different categorical or discrete variables, such as age groups,
classes, schools, etc., as long as there are not too many categories to compare. They are also
very useful for time series data.

When to use Pie Charts?


 A pie chart can only be used if the sum of the individual parts adds up to a meaningful whole,
and is built for visualizing how each part contributes to that whole.
 When a visual representation of “percent of…” or “part of…” is needed for a discussion.
 To convey that one segment of the total is relatively small or large.
 You have a total number that can be split up into 2-5 categories.
 One category outweighs the other by a significant margin.

MONTHLY BUDGET BREAKDOWN


Others Rent
14% 15%

Savings Groceries
26% 25%

Transportation
Personal Expenses 5%
15%

Line Charts and Scatter Plots

 Line Graph is a visualization that displays the changes over a specified time. The chart has two
axes: a horizontally-oriented x-axis and a vertical y-axis. The x-axis mainly depicts a dimensional
attribute, such as time.
 The Line Chart is best-suited in displaying patterns and trends present in your data. In other
words, you can use it to show whether a particular metric is on an up or downtrend in terms of
growth.
 A scatterplot shows the relationship between two quantitative variables measured for the same
category.
 A Scatter Plot is a visualization that displays relationships between vital data points. A Scatter
Plot is commonly known as an x-y Graph.

Box Plots and Heat Maps

 A box plot (aka box and whisker plot) uses boxes and lines to depict the distributions of one or
more groups of numeric data.
 Box limits indicate the range of the central 50% of the data, with a central line marking the
median value. Lines extend from each box to capture the range of the remaining data, with dots
placed past the line edges to indicate outliers.
 A heatmap (aka heat map) depicts values for a main variable of interest across two axis variables
as a grid of colored squares.
 The axis variables are divided into ranges like a bar chart or histogram, and each cell’s color
indicates the value of the main variable in the corresponding cell range.
 Heatmaps are used to show relationships between two variables, one plotted on each axis. By
observing how cell colors change across each axis, you can observe if there are any patterns in
value for one or both variables.
 The variables plotted on each axis can be of any type, whether they take on categorical labels or
numeric values.
 A box plot or boxplot (also known as a box and whisker plot) is a type of chart often used in
explanatory data analysis.
 Box plots visually show the distribution of numerical data and skewness by displaying the data
quartiles (or percentiles) and averages.

Choosing the Right Visualization:

Keep in mind that every visual representation has its corresponding usage. Every set of data needs the
correct and appropriate visual representation in order for the consumer or audience to understand it
better with less explanation but with effectivity.

You can use a frequency table and histogram to count frequencies (how often something occurs) and if
the data to be visualized is numerical.
Bar Charts are best to utilize if the data to be visualized represent segments of data and to see growth or
comparison.

Pie Charts on the other hand are used to represent data that are a part of a whole, in order for easier
analysis of how a certain category or data can be represented in a whole and what percentage it
occupies.

Line Graphs are best-suited in displaying patterns and trends. Usually used in the business sector to see
the upward and downward trends or growth in sales at a certain time period.

Scatter Plots, use scatter plot if there is a need to display data between vital data points.
Box Plots are efficient to use if the data to be pictured shows the distribution of numerical data and
skewness

Use heat maps to show relationships between two variables, one plotted on each axis to observe
changes across each axis in a color-coded manner.

Real World Applications of Data Classification and Visualization

1. Business Sector – The business sector is one of the main user of data classification and
visualization. Graphs and charts are usually used to see the upward and downward trend
in sales, anticipating growth in the upcoming trends by using previous and current data
stored. Classifying and organizing data in the business sector helps in predicting
possibilities and aids them to prepare of what could happen in the future. The trajectory
of sales can be predicted using current data.
2. Public Health – Classified and organized data is important in the medical field similarly
with the importance in the business sector, main example is during the pandemic, these
graphs and charts were used to predict possible outcomes that may take place in the
future such as the spike in the number of incidents at a certain time, visualization where
used to impart these news to the public such as the mortality rate, comparing one
month to another, the rise and drop of the number of cases as well as to show how
effective certain measures are.
3. Scientific Researches - The main goal of using visualizations for data gathered is to
simplify and reduce time in analyzing them. Part of the scientific method is to analyze
these data in order to come up with a conclusion.
4. Others – thesis, dissertations, researches, monthly/quarterly/annual expenditures,
education sector, economic sector.

Ethical considerations involved in data collection in survey research.

 Transparency: Survey researchers should be transparent about how they collect and use data.
This meant they should provide clear and concise information about the survey, how the data
will be used, and who will have access to it.
 Consent: Individuals should have the right to consent to data collection. This means that they
should be allowed to opt in or out of the survey, and they should be clear about what data will
be collected and how it will be used.
 Security: Survey researchers should take steps to protect the security of the data they collect.
This includes using strong encryption and other security measures to prevent unauthorized
access to data.
 Accountability: Survey researchers should be accountable for their data collection practices. This
means that they should have clear policies and procedures in place for handling data, and they
should be able to demonstrate that they are complying with these policies and procedures.
 User rights: Individuals should have certain rights over their data, such as the right to access
their data, the right to correct their data, and the right to delete their data. Survey researchers
should respect these rights.
 Privacy laws – Many countries and jurisdictions now have privacy laws that researchers should
carefully consider and incorporate in their data collection projects. This mitigates the possibility
of being on the wrong side of legal systems and respects the interests of the survey subjects.

You might also like