KEMBAR78
Chapter 4.data Management Lesson 1 2 | PDF | Mode (Statistics) | Median
100% found this document useful (1 vote)
448 views86 pages

Chapter 4.data Management Lesson 1 2

This document provides information about data management and statistical concepts. It discusses how mathematics can be used as a tool to solve practical problems. It also explains how data management involves collecting, organizing, and maintaining data in a structured way. Statistics plays an important role in effective data management. The document then describes different lessons that will be covered, including gathering and organizing data, measures of central tendency and dispersion, probabilities and normal distributions, and linear regression and correlation. It provides details on textual, tabular, and graphical presentation of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
448 views86 pages

Chapter 4.data Management Lesson 1 2

This document provides information about data management and statistical concepts. It discusses how mathematics can be used as a tool to solve practical problems. It also explains how data management involves collecting, organizing, and maintaining data in a structured way. Statistics plays an important role in effective data management. The document then describes different lessons that will be covered, including gathering and organizing data, measures of central tendency and dispersion, probabilities and normal distributions, and linear regression and correlation. It provides details on textual, tabular, and graphical presentation of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

Section 2.

Mathema
tics as a
Tool
(Part 1)
Mathematics
as a tool refers to the
use of mathematical
concepts, techniques,
and principles as
instruments for
solving practical
problems, making
decisions, and
gaining a deeper
understanding of the
world.
Chapter 4
Data Management
Data management refers
to the practice of collecting,
storing, organizing, and
maintaining data in a structured
and systematic way.

Statistics plays a crucial


role in data management.
Statistics is essential for the
effective management of data in
various aspects, from data
collection to data analysis,
visualization, and decision-
making. It ensures that data is
accurate, secure, and valuable for
organizations and researchers.
Lesson 1: Gathering and Organizing Data,
Representing Data using Graphs and Charts
and Interpreting Organized Data
Lesson 2: Measures of Central tendency,
Dispersion, and Relative Position
Lesson 3: Probabilities and Normal
Distributions
Lesson 4: Linear Regression and Correlation
Lesson 1
Gathering and
Organizing Data,
Representing Data
using Graphs and
Charts and Interpreting
Organized Data
Gathering and Organizing Data
To describe situations, draw conclusions, or
make inferences about events, the researcher must
organize the data in some meaningful way. The
data gathered shall be presented, analyzed and
interpreted that can be easily understood by the
reader. Data may be presented in textual, tabular,
graphical or a combination of these.
Textual presentation uses statements with numerals in order to
describe the data for the concrete information and in expository form. It
is to discuss the data and the information and interpretation it carries.
For example, the math test scores of 15 students out of 50 items are 47,
48, 49, 42, 36, 38, 40, 35, 50, 26, 25, 31, 34, 19, 41.
Tabular presentation uses statistical table to directly display
the quantities or values collected as data. It is a systematic arrangement
of information into columns and rows. Examples of tabular presentation
are simple frequency distribution (or it can be just called a frequency
distribution), cumulative frequency distribution, grouped frequency
distribution, and cumulative grouped frequency distribution.
Graphical presentation illustrates data in a form of graphs
aiding readers to understand the text easily. It is the most attractive,
effective and convincing way in describing the data. There are various
types of graphs we can prepare like bar graph, circle graph (pie chart),
line graph, pictograph, histogram, frequency polygon, and a scatter
diagram.
Tabular Presentation of Data
Frequency distribution
The frequency distribution table (FDT) is a statistical table
showing the frequency or number of observations contained in each of
the defined classes or categories.
Parts of a Statistical Table
a. The table heading includes the table number and the title of the
table. The table number (e.g., Table 1) appears above the table title
and body in bold font. The table title appears one double-spaced line
below the table number. Give each table a brief but descriptive title, and
capitalize the table title (first letter of the word) in italic title case.
b. The body is the main part of the table that contains the information
or figures. includes all the rows and columns of a table (including the
headings row). A cell is the point of intersection between a row and a
column.
c. Stub heading is the heading that describes the leftmost column.

d. Stub column or Stub usually found at the leftmost column of the table. It lists the
major independent or predictor variables.

e. Column heading is the heading that identifies the entries in just one column in the
table body.

f. Table notes show the explanations to supplement or clarify information in the


table body.
Types of FDT
a) Qualitative or Categorical FDT. It is a frequency
distribution table where the data are grouped according to
some qualitative characteristics; data are grouped into
non-numerical categories. Categorical frequency
distribution is used for data that can be placed in specific
categories, such as nominal- or ordinal-level data. For
example, data such as political affiliation, religious
affiliation, or major field of study would use categorical
frequency distributions.
b) Quantitative FDT. It is a frequency distribution table
where the data are grouped according to some numerical
or quantitative characteristics.
Example 1. Twenty-five army inductees were given a
blood test to determine their blood type. The data set is

Construct a frequency distribution for the data.


Since the data are categorical, discrete classes can be used. There
are four blood types: A, B, O, and AB. These types will be used as
the classes for the distribution. The procedure for constructing a
frequency distribution for categorical data is given next.
Step 1. Make a table as shown.
Step 2. Count each blood type and place the results in
frequency column
Step 3. To find the percentage of values in each class, use
the formula where frequency of the class and total
frequency. For example, in the class of type A blood, the
percentage is
When the range of the data is large, the data must
be grouped into classes that are more than one unit in
width, in what is called a grouped or quantitative
frequency distribution. It is a frequency distribution
table where the data are grouped according to some
numerical or quantitative characteristics.
Example 2. Suppose a researcher wished to do a
study on the ages of the 40 patients confined at a certain
hospital. The researcher first would have to get the data
on the ages of the participants. When the data are in
original form, they are called raw data and are listed next.
Construct the FDT of the given data set.
Step 1. Determine the range ():
Step 2. Determine the number of classes (): (The rough method is
between 5-20 or using where is the total number of observations
in the data set. or classes
Step 3. or
Step 4. Enumerate the classes or categories. The classes must be mutually
exclusive. Mutually exclusive classes have nonoverlapping class limits so that
data cannot be placed into two classes. The classes must be continuous. Even if
there are no values in a class, the class must be included in the frequency
distribution. There should be no gaps in a frequency distribution. The only
exception occurs when the class with a zero frequency is the first or last class. A
class with a zero frequency at either end can be omitted without affecting the
distribution. The classes must be exhaustive. There should be enough classes to
accommodate all the data.

Step 5. Determine the frequency of each class interval.


Step 6. Compute for the values in the other columns of the FDT as deemed
necessary.

Other Columns in the FDT

1. True Class Boundaries (TCB)

True Lower Class Boundaries (TLCB) = Lower Limit – 0.5 unit of a measure

True Upper Class Boundaries (TUCB) = Upper Limit + 0.5 unit of a measure

2. Class Mark (CM). It is the midpoint of the class interval.


or
3. Relative Frequency (RF). It is the proportion of observations falling in a class
and expressed in percentage.
4. Cumulative Frequency (CF). It is the accumulated frequency of the classes. A
cumulative frequency distribution is a distribution that shows the number of data
values less than or equal to a specific value (usually an upper boundary). The
values are found by adding the frequencies of the classes less than or equal to
the upper class boundary of a specific class. This gives an ascending cumulative
frequency. Cumulative frequencies are used to show how many data values are
accumulated up to and including a specific class.

a. Less than CF (<CF). It is the accumulated frequency from the lowest class
interval.

b. Greater Than CF (>CF). It is the accumulated frequency from the highest


class interval

5. Relative Cumulative Frequency (RCF)

c. Less than RCF (<RCF)

d. Greater than RCF (>RCF)


Graphical Presentation Data
A graph or a chart is a device for showing
numerical values or relationships in pictorial form. The
following are the advantages of graphical presentation of
data.
The main features and implications of a body of data
can be seen at once.
It can attract attention and hold the reader’s interest.
It simplifies concepts that would otherwise have been
expressed in so many words.
It can readily clarify data, frequently bring out hidden
facts and relationships.
1. Pie Chart (Circle graph). Pie chart is a circular graph that is
useful in showing how a total quantity is distributed among a group of
categories. The “pieces of the pie” represent the proportion of the
total that fall into each category. It is useful for data sorted into
categories for a specific period. Its emphasis is to show the
components parts with respect to the total in terms of the percentage
distribution. It uses the pie chart if there are less than 8 categories in
the data set. The purpose of the pie graph is to show the relationship
of the parts to the whole by visually comparing the sizes of the
sections. Percentages or proportions can be used. The variable is
nominal or categorical. It is a circle that is divided into sections or
wedges according to the percentage of frequencies in each category of
the distribution.
Guidelines on Pie Chart
plot the biggest slice at 12 o’clock
arrange components of the pie chart according to magnitude
if there is an “Others” category, put it in the last section
use different colors, shadings, or patterns to distinguish one
section of the pie to the other sections
2. Bar chart (Column graph). Like pie charts, column graphs or
bar charts are applicable only to grouped data. They should be used
for discrete, grouped data of ordinal and ordinal scale. Column
chart is appropriate for comparing the magnitudes of variable in
the x-axis for the different categories of variable in the y-axis.
When the data are qualitative or categorical, bar graphs can be
used to represent the data.
3. Scatter plot (Scatter graph) is a graph used to represent
the measurements or values that are thought to be related. It is
used to examine possible relationships between two
numerical variables. The two variables are plot in x-axis and
y-axis.
4. Time series graph represents data that occur over specific
period of time under observation. It shows trends, patterns,
forecasts and applicable for one or more time series data for
comparison purposes.
5. Pictograph (Pictogram) immediately suggests the nature
of the data being shown. It gives an approximation only of the
actual figures and compares the different categories. The
symbols selected should be self-explanatory and easy to
understand. Each symbol represents a number.
Graphical Presentation using Statistical data

1. Histogram is a bar chart that displays the classes on horizontal axis


and the frequencies of the classes on the vertical axis; the vertical lines of the
bars are erected at the class boundaries and the height of the bars corresponds to
the class frequency. Histograms are applicable only for quantitative data. A
histogram doesn’t show data over time — it shows all the data at one point in
time. The histogram is a graph that displays the data by using contiguous
vertical bars (unless the frequency of a class is 0) of various heights to represent
the frequencies of the classes.
2. Frequency polygon is a graph constructed by plotting the
frequencies at the class marks and connecting the plotted points by means of
straight lines; the polygon is closed by considering an additional class at each
end in the ends of the lines are brought down to the horizontal axis at the
midpoint of the additional classes. The frequency polygon is a graph that
displays the data by using lines that connect points plotted for the frequencies at
the midpoints of the classes. The frequencies are represented by the heights of
the points.
3. Cumulative frequency polygon (Ogive) is a graph
that displays the cumulative frequencies for the classes in
a frequency distribution. The vertical axis represents the
cumulative frequency for the classes in a frequency
distribution. The vertical axis represents the cumulative
frequency of the distribution while the horizontal axis
represents the upper class boundaries of the frequency
distribution.
The less than cumulative frequency polygon (less than
ogive) is plotted against upper class boundaries while the
greater than cumulative frequency polygon (greater than
ogive) is plotted against lower class boundaries.
Lesson 2
Measures of Central
tendency, Dispersion,
and Relative Position
Any given data in statistics are useless
if we don’t interpret them. The most
appropriate measures found to be useful in
describing a distribution of observations are
the measures of central tendency, measures
of dispersion, measures of relative position,
z-scores, box and whisker plot, probability
and normal curve, linear regression and
correlation.
Measures of Central Tendency
A measure of central tendency is any single value that is used to
identify the “center” of the data or the typical value. It is called measure
of central tendency because when the data points are arranged according
to magnitude, it tends to lie centrally within the set. It is the
representative value of the data set. It is the value around which most of
the data points are found.

Mean
The mean represents the center of the data. It is the most
important measure if the distribution is symmetric and the most stable
measure of location. It is used when the data is at least interval. When n
is small, the mean is very sensitive to extreme values.
It is computed by summing all the observations in the sample and
dividing the sum by the number of observations.
Properties of Mean
a) A set of data has only one mean.
b) Mean can be applied for interval and ratio data.
c) All values in the data set are included in computing the mean.
d) The mean is very useful in comparing two or more data sets.
e) Mean is affected by the extreme small or large values on a data set.
f) Mean is most appropriate in symmetrical data.
For the ungrouped data, the following are the formulas of the mean.

• Population Mean (): , where is the score or observation, and is the


number of observations in the population.

• Sample Mean: (): , where is the score or observation, and is the


number of observations in the sample.
Example 3: During a particular summer month, the eight hospitals
in a particular province reported the following number of
admissions in their respective ICUs: 8, 11, 5, 14, 8, 11, 16, and 11.

Solution: Considering this month as the statistical population of


interest, the mean number of ICU admissions is

admissions

Example 4. Determine mean age (in years) of a sample group of


children whose ages are 9, 11, 7, 10, 9, 8, 8, 7, 12, 7 and 13.

Solution: years
Weighted mean ( or ) is the sum of the mean of each
group multiplied by its respective weight divided by the
sum of the weights. (For mean alone, the weight values
in each distribution are equal). Example of weighted
mean is solving the weighted average of a student in a
semester to determine whether he or she belongs to the
dean’s list. Each of his or her grade has a corresponding
number of units (Example, GECMAT is 3 units, major
subject is 4 or 5 units, and so on.)
The formula of the weighted mean is
Example 5. Francis answered 20 calculus problems. He spent
1 hours for the first 6 problems; 45 minutes for the next 3;
and 3 hours for the last 11 problems. What was the average
time (in minutes) he spent for the 20 problems?

Solution
This problem requires the weighted average time because
each set of problems has a weight (which is time).
Median (Population median: , Sample median: )

Median is the positional middle of the data array. In the data array,
one-half of the values precede the median and one-half follow it.
When the data set is ordered, whether ascending or descending, it is
called a data array. Median is an appropriate measure of central
tendency for data that are ordinal or above, but is more valuable in an
ordinal type of data.

Properties of Median
• The median is unique, there is only one median for a set of data.
• The median is found by arranging the set of data from lowest to
highest (or highest to lowest) and getting the value of the middle
observation.
• Median is not affected by the extreme small or large values.
• Median can be applied for ordinal, interval and ratio data.
• Median is most appropriate in a skewed data.
For ungrouped data, the first step in calculating
the median, denoted by (), is to arrange the data in
an array. Let the observation in the array, .
If is odd, the median position equals , and the
value of the observation in the array is taken as
the median, i.e. .
If is even, the mean of the two middle values in
the array is the median, i.e.
Example 6. Find the median of the given data set: 75, 67, 71, 75, and 72

Solution

First, arrange the data set in ascending order: 67, 71, 72, 75, 75

Since , we will use , hence,

Therefore, .

Example 7. The reaction times for a random sample of 9 subjects to a


stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1, and 3.4
seconds. Calculate the median.

Solution

Array: 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3

seconds
Mode (Population mode: , Sample mode: )
Mode is the observed value the occurs most
frequently. It locates the point where the observation
values occur with the greatest density. It does not always
exist, and if it does, it may not be unique. A data set is
said to be unimodal if there is only one mode, bimodal if
there are two modes, multimodal if there three or more.
There are some cases when a data set values have the
same number frequency. When this occurs, the data set is
said to be no mode.
Properties of Mode
• The mode is found by locating the most
frequently occurring value.
• The mode is the easiest average to compute.
• There can be more than one mode or even no
mode in any given data set.
• Mode is not affected by the extreme small or
large values.
• Mode can be applied for nominal, ordinal,
interval, and ratio data.
Example 8. The eight hospitals described in Example 1 had
the following number of ICU admissions: 8, 11, 5, 14, 8, 11,
16, and 11. Find the mode.
Solution
admissions
Example 9. The reaction times for a random sample of 9
objects described in Example 6 were recorded as 2.5, 3.6, 3.1,
4.3, 2.9, 2.3, 2.6, 4.1, and 3.4 seconds. Calculate the mode.
Solution
does not exist since all values have the same frequency.
For the grouped data, the mean is or , where is the frequency of
the class interval and is the midpoint of the class interval.
Example 10. Calculate the mean grade of 50 students in statistics
below and give its description or interpretation.
Solution.
First, determine the midpoint () of each interval and the total
frequency () or .
Second, add a column for , which is the product of a frequency ()
and the midpoint () of the class interval, and find the sum of
column or .

Using the formula , solve for the mean.


. Hence, the mean grade of 50 students in statistics is Satisfactory.
For grouped data, the formula for the median is

where
lower boundary of class containing the median
sample size
cumulative frequency of classes preceding class containing
the median
number of observations in class containing the median
width of the interval containing the median
Example 11. Calculate the median grade of 50 students in statistics
below and give its description or interpretation.
Solution
First, we add two columns for class boundaries and less
than cumulative frequency ().

Then, determine the median class using item in the


distribution. Hence,
If the scores are arranged in an ordered array, the 25 th score of the
distribution falls on the class interval 79.5 – 84.5. Hence, 79.5 – 84.5 is
the median class.

By substitution,

. Hence, the median grade of 50 students in statistics is Satisfactory.


For grouped data, the formula of the mode is

where
lower class boundary of the modal class
difference between the frequency of the modal class and
that of the immediately preceding lower class
difference between the frequency of the modal class and
that of the immediately following the higher class
class width or size
Example 12. Calculate the modal grade of 50 students in statistics
below and give its description or interpretation.
Solution

First, determine the modal class of the distribution. The modal class of
the distribution has the highest frequency. Hence, 80-84 is the modal
class.

By substitution,

Hence, the modal grade of 50 students in statistics is Satisfactory.


Example 13: Find the mean, median, and mode of the following ages in
years below.

a) 3, 4, 5, 5, 6, 7, 9, 10, 14
b) 7, 8, 9, 9, 10, 10, 11, 12
Solution

c) 3, 4, 5, 5, 6, 7, 9, 10, 14
Mean: years
Median: Since is 9 (which is odd), use the formula .

. Hence, years.

Mode: The mode is 5 years since it has the highest frequency (it
appears twice in the distribution)
b) 7, 8, 9, 9, 10, 10, 11, 12
Mean: years
Median: Since is 8 (which is even), use the formula

therefore, the median is 9.5 years.


Mode: The modes are 9 years and 10 years since they have the
highest frequency (appeared twice). It is bimodal.
Measures of Dispersion

Computing a measure of variability is important because without it


a measure of central tendency provides an incomplete description of a
distribution. The mean, for example, only indicates the central score and
where the most frequent scores are. Thus, to completely describe a set of
data, we need to know not only the central tendency but also how much the
individual scores differ from each other and from the center. We obtain this
information by calculating statistics called measures of variability.

Measures of variability/dispersion indicate the extent to which


individual items in a series are scattered about an average. It is used to
determine the extent of the scatter so that steps may be taken to control the
existing variation. It is also used as a measure of reliability of the average
value.
Measures of variability communicate three related aspects of the data.
First, the opposite of variability is consistency. Small variability indicates
few and/or small differences among the scores, so the scores must be
consistently close to each other (and reflect that similar behaviors are
occurring). Conversely, larger variability indicates that scores (and behaviors)
were inconsistent. Second, recall that a score indicates a location on a variable
and that the difference between two scores is the distance that separates them.
From this perspective, by measuring differences, measures of variability
indicate how spread out the scores and the distribution are. Third, a
measure of variability tells us how accurately the measure of central
tendency describes the distribution. Our focus will be on the mean, so the
greater the variability, the more the scores are spread out, and the less
accurately they are summarized by the one, mean score. Conversely, the
smaller the variability, the closer the scores are to each other and to the mean.

One way to describe variability is to determine how far the lowest


score is from the highest score. The descriptive statistic that indicates the
distance between the two most extreme scores in a distribution is called the
range.
Range
Probably the simplest and easiest way to determine
measure of dispersion is the range. The range of a set of
measurements is the difference between the largest value
and the smallest value.
Range = Maximum value Minimum value
Example 14. The IQ scores of 5 members of CHMSC
Basketball men varsity are 108, 112, 127, 116, and 113.
Find the range.
Solution:
Variance and Standard Deviation

The variance and standard deviation are two measures of variability


that indicate how much the scores are spread out around the mean.

The definitional formula for the sample variance is

where is the sample variance, is the sample standard deviation, is the value
of any particular observation or measurement, is the sample mean, and is the
sample size.

The definitional formula for the sample standard deviation is

.
Example 15: A sample of 5 households showed the following
number of household members: 3, 8, 5, 4, and 4. Find the variance
and standard deviation.
Solution
First, solve for the sample mean () and add the columns for ()
and .
Second, solve for the sample variance and sample
standard deviation by substitution,
Measures of Relative Position
When presenting or analyzing data set it is sometimes
helpful to group subjects into several equal groups. For
example, to create four equal groups we need the values that
split the data such that 25% of the observations are in each
group. The cut off points are called quartiles, and there are
three (3) of them (the middle one also being called the
median). The general term for such cut off points is quantiles;
other values likely to be encountered are deciles, which split
data into 10 parts, and percentiles, which split the data into
100 parts (also called centiles). Values such as quartiles can
also be expressed as percentiles; for example, the lowest
quartile is also the 25th percentile and the median is the 50th
percentile or the 5th decile.
1. Percentiles
Percentiles are values that divide a set of observations in an
array into 100 equal parts. Thus, P1, read as first percentile, is the
value below which 1% of the values fall P 2, read as second percentile,
is the value below which 2% of the values fall,…, P 99, read as ninety –
ninth percentile, is the value below which 99% of the values fall.
Example. The 80th percentile of a distribution is a value such
that at least 80 percent of the ordered observations are less than its
value and at least 20 percent of the ordered observations are larger
than its value. If : At least 80% of the ordered observations are less
than 75 or at least 20% of the ordered observations are larger than 75.
So any observation that is smaller than value belongs in the lower
80% of the distribution while any observation greater than value
belongs in the upper 20% of the distribution.
To compute for the percentile, we have
Pi = the value of the observation in the array

Note:
 If is a whole number, the percentile is the
observation.
 If has a fractional value (decimal value), the
percentile is next higher integer value of .
Example 16. The following were the scores of 10 students in a
short quiz. Find the 64th percentile.
2 8 6 9 7 5 8 10 10
1
Solution: First arrange the data from lowest to highest.
1 2 5 6 7 8 8 9 10
10
Then, using
observation. We have

th
or 8th observation (always round up to the nearest whole number)
Since, the 8th observation in an ordered array is 9, therefore, the
64th percentile of the distribution is 9, which is interpreted as at
least 64% of the scores are below 9.
Approximating the Percentile from a Frequency distribution
To solve for the percentile in grouped data, we have

where
The Pith class is the class where the falls.

the lower class boundary of the Pith class

class size of the Pith class

less than cumulative frequency of the class preceding the Pith class

frequency of the Pith class


Example 17. Find the 35th percentile of the given
frequency distribution of 110 scores in achievement test
below.
Solution

First, add one column in the FDT for and determine the P35th class using .

Using , we have . Since 38.5 falls on the class interval 70 – 74, hence, the P35th class is 70
– 74. Therefore, we have

By substitution, we have

. Hence, at least thirty-five percent of the scores in the achievement test are below 70.82.
2. Deciles
Deciles are values that divide the array into
10 equal parts. Thus, D1, read as first decile, is the
value below which is 10% of the values fall, D 2,
read as second decile, is the value below which
20% of the values fall,…, D9, read as ninth decile,
is the value below which 90% of the values fall.
To compute for the decile, we have
Di = the value of the observation in the array
Example 18. From the given set scores in a quiz find the 4 thdecile or D4.

3 8 9 11 12 18 19
Solution
Since the data is already arranged from lowest to highest then we
may proceed in finding the 4thdecile.
3 8 9 11 12 18 19
Using , we have
th
or 4th observation (always round up to the nearest whole number)
Since, the 4th observation in an ordered array of the given
distribution is 11, therefore, the 4th decile of the distribution is 11, which
is interpreted as at least 40% of the scores are below 11.
Approximating the Decile from a Frequency distribution
To solve for the decile in grouped data, we have

where
The Dith class is the class where the falls.

the lower class boundary of the Dith class

class size of the Dith class

less than cumulative frequency of the class preceding the Dith class

frequency of the Dith class


Example 19. Find the 6th decile of the given frequency
distribution of 110 scores in achievement test below.
Using , we have . Since 66 falls on the class interval 75 – 79, hence, the D6th class is 75 –
79. Therefore, we have

By substitution, we have

Hence, at least sixty percent of the scores in the achievement test are below 78.45.
3. Quartiles

Quartiles are values that divide the array into 4 equal parts. Thus, Q 1, read as
first quartile, is the value below which 25% of the values fall Q 2, read as second
quartile, is the value below which 50% of the values fall Q 3, read as third quartile, is the
value below which 75% of the values fall.

Example 20. From the given set scores in a quiz find the 3 rd quartile or Q3

3 8 9 11 12 18 19

Solution

Since the data is already arranged from lowest to highest then we may proceed in finding
the 3rd quartile.

Using , we have
th
observation.

Since, the 6th observation in an ordered array of the given distribution is 18, therefore, the
3rd quartile of the distribution is 18, which is interpreted as at least 75% of the scores are
below 18.
Approximating the Quartile from a Frequency distribution
To solve for the quartile in grouped data, we have

where
The Qith class is the class where the falls.

the lower class boundary of the Qith class

class size of the Qith class

less than cumulative frequency of the class preceding the Qith class

frequency of the Qith class


Example 21. Find the 1st quartile of the given frequency
distribution of 110 scores in achievement test below.
Using , we have th. Since 27.5 falls on the class interval 65 – 69, hence, the Q1st
class is 65 – 69. Therefore, we have

By substitution, we have

Hence, at least 25% of the scores in the achievement test are below 67.
4. Score
Score is used to know the position of one
observation relative to others in a set of data. Let say, we
want to know a score of a student of 42 compared to the
scores of the other students in the class based from a quiz
on a total of 50 points. The mean and the standard
deviation of the scores can be used to compute a score,
which will measure the relative standing of a
measurement in a data set.
A score measures the distance between an
observation and the mean, measured in units of standard
deviation. The following formulas show how to compute
the score for a data value in a population and in a
sample.
For population: For sample:
Example 22: The monthly expenditures of a large group of
households has a mean of ₱48,700 and a standard deviation of
₱10,400. What is the value of monthly expenditures of ₱59,100
and ₱38,300?
Solution
Let ₱48,700 and ₱10,400
Using the formula of to determine values for the two values
(₱59,100 and ₱38,300) are computed as follows:
For ₱59,100:
For ₱38,300:
The of 1.00 indicates that a monthly expenditure of ₱59,100 for
households is one standard deviation above the mean, and a of
shows that a ₱38,300 monthly expenditure is one standard
deviation below the mean. Note that both household monthly
expenditures (₱59,100 and ₱38,300) are the same distance
(₱48,700) from the mean.
Example 23: Raul has taken two tests in his mathematics
class. He scored 72 on the first test, for which the mean
of all scores was 65 and the standard deviation was 8. He
received a 60 on a second test, for which the mean of all
scores was 45 and the standard deviation was 12. In
comparison to the other students, did Raul do better on
the first test or the second test?

Solution: Find the score for each test.

Raul scored 0.875 standard deviations above the mean on


the first test and 1.25 standard deviations above the mean
on the second test. These scores indicate that, in
comparison to his classmates, Raul scored better on the
second test than he did on the first test.
Example 24: A consumer group tested a sample of 100 light bulbs.
It found that the mean life expectancy of the bulbs was 842 hours,
with a standard deviation of 90 hours. One particular bulb from the
DuraBright Company had a score of 1.2. What was the life span of
this light bulb?
Solution: Substitute the given values into the score equation and
solve for .

or hours
The light bulb had a life span of 950 hours.
5. Box-and-Whisker Plot
A box-and-whisker plot (sometimes called a
boxplot) is often used to provide a visual
summary of a set of data. It is a graph of a data set
obtained by drawing a horizontal line from the
minimum data value to first quartile (), drawing a
horizontal line to third quartile () to the maximum
data value, and drawing a box whose vertical line
passes through and with a vertical line inside the
box passing through the median or second quartile
().
The boxplot will give the following information:
a) If the median is near the center of the box, the distribution is
approximately symmetric.
b) If the median falls to the right of the center of the box, the distribution is
negatively skewed.
c) If the median falls to the left of the center of the box, the distribution is
positively skewed.
d) If the lines are about the same length, the distribution is approximately
symmetric.
e) If the left line is larger than the right line, the distribution is negatively
skewed.
f) If the right line is larger than the left line, the distribution is positively
skewed.
Example 25: Construct a boxplot for the data set of the ages of 11 middle-management employees of
a certain company. The ages are 45, 48, 49, 49, 51, 51, 53, 55, 55, 58, and 59. What can you say
about the distribution of the data set?

Step 1: Determine the , Median, and of the given data set. Recall that , Median = 51, and .

Step 2: Locate the lowest value, , the median, , and the highest value on the scale.

Step 3: Draw a box around and , draw a vertical line through the median, and connect the upper and
lower values, as shown in the figure below.

The distribution of 11 middle-management employees of a certain company is positively


skewed since the median falls to the left of the center of the box.

𝑄1 ~ 𝑄3
𝑋
𝐿𝑆 𝐻𝑆

You might also like