Course : STAT6171 – Basic Statistics
ORGANIZING AND VISUALIZING
VARIABLES
Session 2
Learning Objectives
1. Describe the basic concepts of descriptive and
inferential statistics.
2. Use Microsoft Excel to do data analysis.
Bina Nusantara University 2
ORGANIZING CATEGORICAL
VARIABLES
Bina Nusantara University 3
A summary table tallies the set of individual values as
frequencies or percentages for each category. A summary table
helps you see the differences among the categories by
displaying the frequency, amount, or percentage of items in a
set of categories in a separate column.
Table 2.1
How People Pay for
Purchases and Other
Transactions
Bina Nusantara University 4
The sample of 407 retirement funds for The Choice Is Yours
scenario (see page 56) includes the variable risk that has the
defined categories Low, Average, and High. Construct a
summary table of the retirement funds, categorized by risk.
Bina Nusantara University 5
• A contingency table cross-tabulates, or tallies jointly, the
data of two or more categorical variables, allowing you to
study patterns that may exist between the variables.
• Tallies can be shown as a frequency, a percentage of the
overall total, a percentage of the row total, or a percentage
of the column total, depending on the type of contingency
table you use.
• Each tally appears in its own cell, and there is a cell for each
joint response, a unique combination of values for the
variables being tallied.
Bina Nusantara University 6
Table 2.3 presents the completed contingency table after all
407 funds have been tallied. This table shows that there are
199 retirement funds that have the fund type Growth and risk
level Average. In summarizing all six joint responses, the table
reveals that Growth and Average is the most frequent joint
response in the sample of 407 retirement funds.
Bina Nusantara University 7
Bina Nusantara University 8
Bina Nusantara University 9
Learning the
Basics
Bina Nusantara University 10
ORGANIZING NUMERICAL VARIABLES
Bina Nusantara University 11
A frequency distribution tallies the values of a numerical
variable into a set of numerically ordered classes. Each class
groups a mutually exclusive range of values, called a class
interval. Each value can be assigned to only one class, and
every value must be contained in one of the class intervals.
Bina Nusantara University 12
Bina Nusantara University 13
Bina Nusantara University 14
• Relative frequency and percentage distributions present tallies in
ways other than as frequencies.
• A relative frequency distribution presents the relative
frequency, or proportion, of the total for each group that each
class represents.
• A percentage distribution presents the percentage of the total
for each group that each class represents. When you compare
two or more groups, knowing the proportion (or percentage) of
the total for each group better facilitates comparisons than a
table of frequencies for each group would.
Bina Nusantara University 15
Bina Nusantara University 16
If there are 80 values and the frequency in a certain class is 20, the
proportion of values in that class is
20 / 80 = 0.25
and the percentage is
0.25 * 100% = 25%
Bina Nusantara University 17
You construct a relative frequency distribution by first determining the
relative frequency in each class. For example, in Table 2.8 on page
63, there are 50 center city restaurants, and the cost per meal at 9 of
these restaurants is between $50 and $60. Therefore, as shown in
Table 2.10, the proportion (or relative frequency) of meals that cost
between $50 and $60 at center city restaurants is
9 / 50 = 0.18
You construct a percentage distribution by multiplying each proportion
(or relative frequency) by 100%. Thus, the proportion of meals at
center city restaurants that cost between $50 and $60 is 9 divided by
50, or 0.18, and the percentage is 18%.
Bina Nusantara University 18
As a member of the company task force in The Choice Is Yours
scenario (see page 56), you want to compare the one-year return
percentages for the growth and value retirement funds. You
construct relative frequency distributions and percentage
distributions for these funds.
Bina Nusantara University 19
The cumulative percentage distribution provides a way of presenting
information about the percentage of values that are less than a specific
amount. You use a percentage distribution as the basis to construct a
cumulative percentage distribution.
Bina Nusantara University 20
As a member of the company task force in The Choice Is Yours scenario
(see page 56), you want to continue comparing the one-year return
percentages for the growth and value retirement funds. You construct
cumulative percentage distributions for the growth and value funds.
Bina Nusantara University 21
APPLYING THE CONCEPTS
Bina Nusantara University 22
VISUALIZING CATEGORICAL
VARIABLES
Bina Nusantara University 23
A bar chart visualizes a categorical variable as a series of bars,
with each bar representing the tallies for a single category. In a
bar chart, the length of each bar represents either the frequency
or percentage of values for a category and each bar is separated
by space, called a gap.
Bina Nusantara University 24
As a member of the company task force in The Choice Is Yours
scenario (see page 56), you want to examine how the risk
categories in Table 2.2 on page 58 directly compare to each other.
Bina Nusantara University 25
A pie chart and a doughnut chart both use parts of a circle to
represent the tallies of each category of a categorical variable. The
size of each part, or slice, varies according to the percentage in each
category. To represent a category as a slice, you multiply the
percentage of the whole that the category represents by 360 (the
number of degrees in a circle) to get the size of the slice.
Bina Nusantara University 26
As a member of the company task force in The Choice Is Yours scenario
(see page 56), you want to examine how the risk categories in Table 2.2
on page 58 form parts of a whole.
Bina Nusantara University 27
• In a Pareto chart, the tallies for each category are plotted as
vertical bars in descending order, according to their
frequencies, and are combined with a cumulative percentage
line on the same chart.
• Pareto charts get their name from the Pareto principle, the
observation that in many data sets, a few categories of a
categorical variable represent the majority of the data, while
many other categories represent a relatively small, or trivial,
amount of the data.
Bina Nusantara University 28
Bina Nusantara University 29
Bina Nusantara University 30
Construct a Pareto chart from Table 2.1 (see page 58), which
summarizes how people pay for purchases and other transactions.
Bina Nusantara University 31
Visualizing Two Categorical
Variables
Bina Nusantara University 32
The side-by-side chart A side-by-side chart visualizes two
categorical variables by showing the bars that represent the
categories of one variable set grouped by the categories of the
second variable.
Bina Nusantara University 33
The doughnut chart Doughnut charts can visualize two as well as a single
categorical variable. When visualizing two variables, the doughnut chart
appears as two concentric rings, one inside the other, each ring
containing the categories of one variable.
Bina Nusantara University 34
APPLYING THE CONCEPTS
2.24 An online survey of CFA Institute members was conducted to gather feedback on
market sentiment, performance, and market integrity issues in October 2014. Members
were asked to indicate the most needed action to improve investor trust and market
integrity. The survey results were as follows:
Bina Nusantara University 35
APPLYING THE CONCEPTS
a. Construct a bar chart, a pie or doughnut chart, and a
Pareto chart.
b. Which graphical method do you think is best for
portrayingthese data?
c. What conclusions can you reach concerning the most
needed
d. action to improve investor trust and market integrity?
Bina Nusantara University 36
APPLYING
THE
CONCEPTS
Bina Nusantara University 37
Visualizing Numerical Variables
Bina Nusantara University 38
A stem-and-leaf display visualizes data by presenting the data as
one or more row-wise stems that represent a range of values. In
turn, each stem has one or more leaves that branch out to the right
of their stem and represent the values found in that stem. For stems
with more than one leaf, the leaves are arranged in ascending
order.
Bina Nusantara University 39
A histogram visualizes data as a vertical bar chart in which
each bar represents a class interval from a frequency or
percentage distribution. In a histogram, you display the
numerical variable along the horizontal (X) axis and use the
vertical (Y) axis to represent either the frequency or the
percentage of values per class interval. There are never any
gaps between adjacent bars in a histogram.
Bina Nusantara University 40
When using a categorical variable to divide the data of a numerical
variable into two or more groups, you visualize data by constructing
a percentage polygon. This chart uses the midpoints of each class
interval to represent the data of each class and then plots the
midpoints, at their respective class percentages, as points on a line
along the X axis.
Bina Nusantara University 41
As a member of the company task force in The Choice Is Yours
scenario (see page 56), you seek to compare the past
performance of the growth funds and the value funds using the
one-year return percentage variable. Using the data from the
sample of 407 funds, you construct percentage polygons for the
growth and value funds to create a visual comparison.
Bina Nusantara University 42
The cumulative percentage polygon, or ogive, uses the
cumulative percentage distribution discussed in Section 2.2
to plot the cumulative percentages along the Y axis. Unlike
the percentage polygon, the lower boundary of the class
interval for the numerical variable are plotted, at their
respective class percentages as points on a line along the X
axis.
Bina Nusantara University 43
As a member of the company task force in The Choice Is
Yours scenario (see page 56), you seek to compare the past
performance of the growth funds and the value funds using the
one-year return percentage variable. Using the data from the
sample of 407 funds, you construct cumulative percentage
polygons for the growth and the value funds.
Bina Nusantara University 44
APPLYING THE CONCEPTS
Bina Nusantara University 45
Visualizing Two Numerical
Variables
Bina Nusantara University 46
A scatter plot explores the possible relationship between two
numerical variables by plotting the values of one numerical variable
on the horizontal, or X, axis and the values of a second numerical
variable on the vertical, or Y, axis.
For example, a marketing analyst could study
the effectiveness of advertising by comparing
advertising expenses and sales revenues of 50
stores by using the X axis to represent
advertising expenses and the Y axis to
represent sales revenues.
Bina Nusantara University 47
Suppose that you are an investment analyst who has been asked to review the
valuations of the 30 NBA professional basketball teams. You seek to know if the
value of a team reflects its revenues. You collect revenue and valuation data (both in
$millions) for all 30 NBA teams, organize the data as Table 2.17, and store the data
in NBAValues .
Bina Nusantara University 48
Bina Nusantara University 49
A time-series plot plots the values of a numerical variable on the Y
axis and plots the time period associated with each numerical value on
the X axis. A time-series plot can help you visualize trends in data that
occur over time.
Bina Nusantara University 50
APPLYING
THE
CONCEPTS
Bina Nusantara University 51
Excel Guide
Bina Nusantara University 52
Histogram
1. Input the meal cost to excel.
2. Choose Data Analysis – Histogram
3. Input the information as needed to
create the histogram.
Bina Nusantara University 53
Data Analysis - Histogram
Bina Nusantara University 54
Bina Nusantara University 55
Pie Chart
You can input your data
to excel.
Bina Nusantara University 56
Choose insert – find the pie chart icon.
Bina Nusantara University 57
Bar Chart
Number of Funds
300
250
1. You can input your 200
150
data to excel. 100
2. Choose insert – bar 50
icon. 0
Low Average High
Bina Nusantara University 58
David M. Levine, David F. Stephan, Kathryn A. Szabat. Statistics for
Managers using Microsoft Excel. Pearson, 8th Edition.
Thank You
Bina Nusantara University 60