Introduction to
Statistics
Sayeeda Jahan
Spring 2021
Chapter 1
Data and Statistics
Applications in Business and Economics
Data
Data Sources
Descriptive Statistics
Statistical Inference
What is Statistics
• The term statistics can refer to numerical facts such as
averages, medians, percentages, and maximums that help
us understand a variety of business and economic
situations.
• Statistics can also refer to the art and science of collecting,
analyzing, presenting, and interpreting data.
Applications in
Business and Economics
Accounting
Public accounting firms use statistical sampling procedures
when conducting audits for their clients.
Finance
Financial analysts use a variety of statistical information,
including price-earnings ratios and dividend yields, to guide
their investment recommendations.
Marketing
Electronic point-of-sale scanners at retail checkout counters are
being used to collect data for a variety of marketing research
applications.
Applications in
Business and Economics
Production
A variety of statistical quality control charts are used to
monitor the output of a production process.
Economics
Economists use statistical information in making forecasts
about the future of the economy or some aspect of it.
Information Systems
A variety of statistical information helps administrators
assess the performance of computer networks.
Data
Elements, Variables, and Observations
Scales of Measurement
Qualitative and Quantitative Data
Cross-Sectional and Time Series Data
Data and Data Sets
Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
All the data collected in a particular study are referred
to as the data set for the study.
Elements, Variables, and Observations
The elements are the entities on which data are collected.
A variable is a characteristic of interest for the elements.
The set of measurements collected for a particular element
is called an observation.
A data set with n elements contains n observations.
The total number of data values in a data set is the number
of elements multiplied by the number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Scales of Measurement
Scales of measurement include:
• Nominal
• Ordinal
• Interval
• Ratio
The scale determines the amount of information contained
in the data.
The scale indicates the data summarization and statistical
analyses that are most appropriate.
Scales of Measurement
Nominal
• Data are labels or names used to identify an
attribute of the element.
• A nonnumeric label or numeric code may be used.
Scales of Measurement
Nominal
• Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business,
Humanities, Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business, 2
denotes Humanities, 3 denotes Education, and
so on).
Scales of Measurement
Ordinal
• The data have the properties of nominal data and
the order or rank of the data is meaningful.
• A nonnumeric label or a numeric code may be
used.
Scales of Measurement
Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Scales of Measurement
Interval
• The data have the properties of ordinal data and
the interval between observations is expressed in
terms of a fixed unit of measure.
• Interval data are always numeric.
Scales of Measurement
Interval
Example:
Melissa has an SAT score of 1205, while Kevin
has an SAT score of 1090. Melissa scored 115
points more than Kevin.
Scales of Measurement
Ratio
• The data have all the properties of interval data
and the ratio of two values is meaningful.
• Variables such as distance, height, weight, and
time use the ratio scale.
• This scale must contain a zero value that indicates
that nothing exists for the variable at the zero
point.
• Ratio data are always numerical.
Scales of Measurement
Ratio
Example:
• Melissa’s college record shows 36 credit hours earned,
while Kevin’s record shows 72 credit hours earned.
Kevin has twice as many credit hours earned as
Melissa
• Price of a book at a retail store is $ 200, while the price
of the same book sold online is $100. The ratio property
shows that retail stores charge twice the online price.
Qualitative and Quantitative Data
Data can be further classified as being qualitative or
quantitative.
The statistical analysis that is appropriate depends on
whether the data for the variable are qualitative or
quantitative.
In general, there are more alternatives for statistical
analysis when the data are quantitative.
Qualitative Data
Qualitative data (also known as Categorical data) are
labels or names used to identify an attribute of each
element.
Qualitative data use either the nominal or ordinal
scale of measurement.
Qualitative data can be either numeric or
nonnumeric.
The statistical analysis for qualitative data are rather
limited.
Quantitative Data
Quantitative data indicate either how many or how
much.
• Quantitative data that measure how many are
discrete.
• Quantitative data that measure how much are
continuous because there is no separation between
the possible values for the data..
Quantitative data are always numeric.
Ordinary arithmetic operations are meaningful only
with quantitative data.
Scales of Measurement
Data
Categorical Quantitative
Non-
Numeric Numeric
numeric
Nominal Ordinal Nominal Ordinal Interval Ratio
Cross-Sectional and Time Series Data
Cross-sectional data are collected at the same or
approximately the same point in time.
• Example: data detailing the number of building
permits issued in June 2020 in each of the districts of
Dhaka.
Time series data are collected over several time
periods.
• Example: data detailing the number of building
permits issued in Dhaka City in the last 36 months
Time Series Data
Graph of Time Series Data
Data Sources
Existing Sources
• Data needed for a particular application might
already exist within a firm. Detailed information
is often kept on customers, suppliers, and
employees for example.
– Internal company records, Business database services, etc.
• Substantial amounts of business and economic
data are available from organizations that
specialize in collecting and maintaining data.
– Government agencies, Industry association, etc.
Data Sources
Existing Sources
• Government agencies are another important
source of data.
• Data are also available from a variety of industry
associations and special-interest organizations.
Data Sources
Internet
• The Internet has become an important source of
data.
• Most government agencies, like the Bureau of the
Census (www.census.gov), make their data
available through a web site.
• More and more companies are creating web sites
and providing public access to them.
• A number of companies now specialize in making
information available over the Internet.
Data Sources
Statistical Studies
• Statistical studies can be classified as either
experimental or observational.
• In experimental studies the variables of interest are first
identified. Then one or more factors are controlled so
that data can be obtained about how the factors
influence the variables.
• In observational (nonexperimental) studies no attempt
is made to control or influence the variables of interest;
an example is a survey.
Data Sources
Data Available From Internal Company Records
Record Some of the Data Available
Employee records Name, address, social security number
Production Part number, quantity produced, direct labor
records cost, material cost
Inventory records Part number, quantity in stock, reorder level,
economic order quantity
Sales records Product number, sales volume, sales volume by
region
Credit records Customer name, credit limit, accounts receivable
balance
Customer profile Age, gender, income, household size
Data Sources
Data Available From Selected Government Agencies
Government Web address Some of the Data Available
Agency
Census Bureau www.census.gov Population data, number of households,
household income
Federal Reserve www.federalreserv Data on money supply, exchange rates, discount
Board e.gov rates
Office of Mgmt. & www.whitehouse.g Data on revenue, expenditures, debt of federal
Budget ov/omb government
Department of www.doc.gov Data on business activity, value of shipments,
Commerce profit by industry
Bureau of Labor www.bls.gov Customer spending, unemployment rate, hourly
Statistics earnings, safety record
Data Acquisition Considerations
Time Requirement
• Searching for information can be time consuming.
• Information might no longer be useful by the time
it is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happens to be available or
that were acquired with little care can lead to poor
and misleading information.
Descriptive Statistics
Descriptive statistics are the tabular, graphical, and numerical methods
used to summarize data.
Most of the statistical information in newspapers, magazines, company
reports, and other publications consists of data that are summarized
and presented in a form that is easy to understand.
Such summaries of data, which may be tabular, graphical, or
numerical, are referred to as descriptive statistics.
Example: Hudson Auto Repair
The manager of Hudson Auto would like to have
a better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Example: Hudson Auto Repair
Tabular Summary (Frequencies and Percent
Frequencies)
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100
Example: Hudson Auto Repair
Graphical Summary (Histogram)
18
16
14
Frequency
12
10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)
Numerical Descriptive Statistics
• The most common numerical descriptive statistic
is the average (or mean).
• The mean demonstrates a measure of the central
tendency, or central location of the data for a
variable.
• Hudson’s mean cost of parts, based on the 50
tune-ups studied is $79 (found by summing up the
50 cost values and then dividing by 50).
Statistical Inference
Population: The set of all elements of interest in a
particular study.
Sample: A subset of the population.
Statistical inference: The process of using data obtained
from a sample to make estimates and test hypotheses about
the characteristics of a population.
Census: Collecting data for the entire population.
Sample survey: Collecting data for a sample.
Example: Hudson Auto Repair
1. Population
consists of all 2. A sample of 50
tune-ups. Average engine tune-ups
cost of parts is is examined.
unknown.
4. The value of the 3. The sample data
sample average is used provide a sample
to make an estimate of average cost of
the population average. $79 per tune-up.
Analytics
Analytics is the scientific process of transforming data
into insight for making better decisions.
Techniques:
Descriptive analytics: This describes what has
happened in the past.
Predictive analytics: Use models constructed from
past data to predict the future or to assess the impact
of one variable on another.
Prescriptive analytics: The set of analytical
techniques that yield a best course of action.
Big data and Data Mining:
Big data: Large and complex data set.
Three V’s of Big data:
Volume : Amount of available data
Velocity: Speed at which data is collected and
processed
Variety: Different data types
Data warehousing
Data warehousing is the process of capturing, storing,
and maintaining the data.
Organizations obtain large amounts of data on a
daily basis by means of magnetic card readers, bar
code scanners, point of sale terminals, and touch
screen monitors.
Wal-Mart captures data on 20-30 million transactions
per day.
Visa processes 6,800 payment transactions per
second.
Data Mining
Methods for developing useful decision-making
information from large databases.
Using a combination of procedures from statistics,
mathematics, and computer science, analysts “mine
the data” to convert it into useful information.
The most effective data mining systems use
automated procedures to discover relationships in
the data and predict future outcomes prompted by
general and even vague queries by the user.
Data Mining Applications
The major applications of data mining have been
made by companies with a strong consumer focus
such as retail, financial, and communication firms.
Data mining is used to identify related products that
customers who have already purchased a specific
product are also likely to purchase (and then pop-ups
are used to draw attention to those related products).
Data mining is also used to identify customers who
should receive special discount offers based on their
past purchasing volumes.
Data Mining Requirements
Statistical methodology such as multiple regression,
logistic regression, and correlation are heavily used.
Also needed are computer science technologies
involving artificial intelligence and machine learning.
A significant investment in time and money is
required as well.
Data Mining Model Reliability
Finding a statistical model that works well for a
particular sample of data does not necessarily mean
that it can be reliably applied to other data.
With the enormous amount of data available, the
data set can be partitioned into a training set (for
model development) and a test set (for validating the
model).
There is, however, a danger of overfitting the model
to the point that misleading associations and
conclusions appear to exist.
Careful interpretation of results and extensive testing
is important.
Ethical Guidelines for Statistical Practice
In a statistical study, unethical behavior can take a
variety of forms including:
• Improper sampling
• Inappropriate analysis of the data
• Development of misleading graphs
• Use of inappropriate summary statistics
• Biased interpretation of the statistical results
One should strive to be fair, thorough, objective, and
neutral as you collect, analyze, and present data.
As a consumer of statistics, one should also be aware
of the possibility of unethical behavior by others.
Ethical Guidelines for Statistical Practice
The American Statistical Association developed the report
“Ethical Guidelines for Statistical Practice”.
It contains 67 guidelines organized into 8 topic areas:
• Professionalism
• Responsibilities to Funders, Clients, Employers
• Responsibilities in Publications and Testimony
• Responsibilities to Research Subjects
• Responsibilities to Research Team Colleagues
• Responsibilities to Other Statisticians/Practitioners
• Responsibilities Regarding Allegations of Misconduct
• Responsibilities of Employers Including
Organizations, Individuals, Attorneys, or Other
Clients
End of Chapter 1