Data Distribution
DATA ORGANIZATION.
Data collection is one of the most important parts of development.
from a research. Thus, the data obtained through a first process
are called unprocessed or raw data. Raw data are long
lists of numbers that are of little use and do not provide the researcher with the
information that is required if not treated beforehand.
Raw data should be synthesized or summarized in a way that is possible.
interpret them, understand them, and use them. The way to organize the data is
through frequency distribution tables.
FREQUENCY DISTRIBUTION TABLES.
In statistics, there are studies related to prices of the
daily diet products, the height and weight of a group of individuals, the
salaries of employees, the degrees of temperature in the environment, the
student grades, etc., that can acquire different values
thanks to an appropriate unit, which is called a variable. The
Numerical representation of variables is called statistical data. The
Frequency distribution tables are a tabular arrangement of data.
statistics, ordered ascending or descending, with the frequency (fi)
of each data point. Frequency distributions can be for non-data
grouped and for grouped data or class interval data.
FREQUENCY DISTRIBUTION TABLE FOR UNGROUPED DATA.
It is the distribution that indicates the frequencies with which the data appears.
statistical, from the smallest of them to the largest of that set without it
it has not made any modifications to the size of the original units. In
these distributions the values of each variable have been only
reorganized, following a logical order with their respective frequencies.
The distribution of ungrouped frequencies or table with ungrouped data is
uses if the variables take a small number of values or the variable is
discrete (it is always associated with integer values).
FREQUENCY DISTRIBUTION TABLE OF CLASS OR GROUPED DATA.
It is the distribution in which the tabular arrangement of statistical data
is arranged in classes and with the frequency of each class; that is,
the original data from several adjacent values of the set are combined
to form a class interval.
There are no established rules to determine when it is appropriate to use
grouped data or ungrouped data; however, it is suggested that when the
the total number of data (N) is equal to or greater than 50 and also the range
If the data series is greater than 20, then the distribution will be used.
frequency for grouped data, this type of distribution will also be used
when line graphs such as the histogram, the polygon are required
of frequency or the warhead. The distribution of grouped frequencies or table with
Grouped data is used if the variables take a large number of
values or the variable is continuous (it can be associated with numbers)
rational and irrational.
The organization of data generally involves the arrangement of the
observations in classes. To arrange the data to express frequency of
occurrence of the observations in each of these classes is known as
frequency distribution. The construction of a distribution table of
frequencies require first of all the selection of the class intervals.
Even though the selection of class intervals is an art and depends on the
involved data, the following steps will be useful:
Step 1. Sort the data from lowest to highest for classification.
Step 2. Calculate the range (R) of the data, that is, the length of the interval (I)
that contains them:
Step 3. Define the
number of classes (Nc), which should not be too small (less than 6) or so
large (more than 20) than the true nature of the distribution is
impossible to visualize.
Next, a specific number of classes or categories must be chosen.
the data that should be classified, the choice of the number of classes or
categories are arbitrary, however a number must be chosen
sufficient classes so that the data does not become piled up, but
neither are too many chosen since the distribution tables of
frequencies would be difficult to manage. There are two methods to determine the
number of classes, which are the most commonly used:
1st.Method.
Root method, which consists of extracting the square root of the size of the
show and round the obtained result up to the nearest integer.
2nd Method. Sturges' Method. This method can give us an approximation.
reasonable to determine the number of classes; this is obtained with the
next mathematical model.
Step 4. The next step is to determine the Class Width (Ac), which is
convenient that the class size is a whole number, therefore it is
It is necessary to round the quotient to the nearest whole number.
Step 5. Next, let's classify the data into each of the classes.
defining the lower limit (Li) and the upper limit (Ls) of each class. In the
in case there is an excess or shortage regarding the data of
the highest numerical value (Xmax) should be used to distribute the excess or deficit of the
as equitably as possible between the extremes of the interval. Step 6. Define
the real class limits according to the following form: Lower real limit
Li - 0.5
of each class (î), which are the number of data that are included in
each class interval. Nc R Ac = Ac = Class amplitude. R = Range. Nc =
Number of classes. Step 8. Define the relative frequencies of each class (fri),
these are percentages of the data that are in each class interval with
regarding the sample size, so to obtain them, it is divided by the
frequency of each class interval relative to the sample size.