KEMBAR78
Lecture 1 Introduction To Statistics | PDF | Level Of Measurement | Statistics
0% found this document useful (0 votes)
40 views48 pages

Lecture 1 Introduction To Statistics

Uploaded by

Sayeeda Jahan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views48 pages

Lecture 1 Introduction To Statistics

Uploaded by

Sayeeda Jahan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Introduction to

Statistics

Sayeeda Jahan

Spring 2021
Chapter 1
Data and Statistics
 Applications in Business and Economics
 Data
 Data Sources
 Descriptive Statistics
 Statistical Inference
What is Statistics

• The term statistics can refer to numerical facts such as


averages, medians, percentages, and maximums that help
us understand a variety of business and economic
situations.

• Statistics can also refer to the art and science of collecting,


analyzing, presenting, and interpreting data.
Applications in
Business and Economics
 Accounting
Public accounting firms use statistical sampling procedures
when conducting audits for their clients.

 Finance
Financial analysts use a variety of statistical information,
including price-earnings ratios and dividend yields, to guide
their investment recommendations.

 Marketing
Electronic point-of-sale scanners at retail checkout counters are
being used to collect data for a variety of marketing research
applications.
Applications in
Business and Economics
 Production
A variety of statistical quality control charts are used to
monitor the output of a production process.

 Economics
Economists use statistical information in making forecasts
about the future of the economy or some aspect of it.

 Information Systems
A variety of statistical information helps administrators
assess the performance of computer networks.
Data

 Elements, Variables, and Observations


 Scales of Measurement
 Qualitative and Quantitative Data
 Cross-Sectional and Time Series Data
Data and Data Sets

 Data are the facts and figures collected, analyzed, and


summarized for presentation and interpretation.

 All the data collected in a particular study are referred


to as the data set for the study.
Elements, Variables, and Observations

 The elements are the entities on which data are collected.

 A variable is a characteristic of interest for the elements.

 The set of measurements collected for a particular element


is called an observation.

 A data set with n elements contains n observations.

 The total number of data values in a data set is the number


of elements multiplied by the number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Scales of Measurement

 Scales of measurement include:


• Nominal
• Ordinal
• Interval
• Ratio
 The scale determines the amount of information contained
in the data.

 The scale indicates the data summarization and statistical


analyses that are most appropriate.
Scales of Measurement

 Nominal

• Data are labels or names used to identify an


attribute of the element.

• A nonnumeric label or numeric code may be used.


Scales of Measurement

 Nominal
• Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business,
Humanities, Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business, 2
denotes Humanities, 3 denotes Education, and
so on).
Scales of Measurement

 Ordinal

• The data have the properties of nominal data and


the order or rank of the data is meaningful.

• A nonnumeric label or a numeric code may be


used.
Scales of Measurement

 Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Scales of Measurement

 Interval
• The data have the properties of ordinal data and
the interval between observations is expressed in
terms of a fixed unit of measure.

• Interval data are always numeric.


Scales of Measurement

 Interval
Example:
Melissa has an SAT score of 1205, while Kevin
has an SAT score of 1090. Melissa scored 115
points more than Kevin.
Scales of Measurement

 Ratio
• The data have all the properties of interval data
and the ratio of two values is meaningful.
• Variables such as distance, height, weight, and
time use the ratio scale.
• This scale must contain a zero value that indicates
that nothing exists for the variable at the zero
point.
• Ratio data are always numerical.
Scales of Measurement

 Ratio
Example:

• Melissa’s college record shows 36 credit hours earned,


while Kevin’s record shows 72 credit hours earned.
Kevin has twice as many credit hours earned as
Melissa

• Price of a book at a retail store is $ 200, while the price


of the same book sold online is $100. The ratio property
shows that retail stores charge twice the online price.
Qualitative and Quantitative Data

 Data can be further classified as being qualitative or


quantitative.
 The statistical analysis that is appropriate depends on
whether the data for the variable are qualitative or
quantitative.
 In general, there are more alternatives for statistical
analysis when the data are quantitative.
Qualitative Data

 Qualitative data (also known as Categorical data) are


labels or names used to identify an attribute of each
element.
 Qualitative data use either the nominal or ordinal
scale of measurement.
 Qualitative data can be either numeric or
nonnumeric.
 The statistical analysis for qualitative data are rather
limited.
Quantitative Data

 Quantitative data indicate either how many or how


much.
• Quantitative data that measure how many are
discrete.
• Quantitative data that measure how much are
continuous because there is no separation between
the possible values for the data..
 Quantitative data are always numeric.
 Ordinary arithmetic operations are meaningful only
with quantitative data.
Scales of Measurement

Data

Categorical Quantitative

Non-
Numeric Numeric
numeric

Nominal Ordinal Nominal Ordinal Interval Ratio


Cross-Sectional and Time Series Data

 Cross-sectional data are collected at the same or


approximately the same point in time.
• Example: data detailing the number of building
permits issued in June 2020 in each of the districts of
Dhaka.

 Time series data are collected over several time


periods.
• Example: data detailing the number of building
permits issued in Dhaka City in the last 36 months
Time Series Data

Graph of Time Series Data


Data Sources

 Existing Sources
• Data needed for a particular application might
already exist within a firm. Detailed information
is often kept on customers, suppliers, and
employees for example.
– Internal company records, Business database services, etc.

• Substantial amounts of business and economic


data are available from organizations that
specialize in collecting and maintaining data.
– Government agencies, Industry association, etc.
Data Sources

 Existing Sources
• Government agencies are another important
source of data.
• Data are also available from a variety of industry
associations and special-interest organizations.
Data Sources

 Internet
• The Internet has become an important source of
data.
• Most government agencies, like the Bureau of the
Census (www.census.gov), make their data
available through a web site.
• More and more companies are creating web sites
and providing public access to them.
• A number of companies now specialize in making
information available over the Internet.
Data Sources

 Statistical Studies

• Statistical studies can be classified as either


experimental or observational.

• In experimental studies the variables of interest are first


identified. Then one or more factors are controlled so
that data can be obtained about how the factors
influence the variables.

• In observational (nonexperimental) studies no attempt


is made to control or influence the variables of interest;
an example is a survey.
Data Sources

Data Available From Internal Company Records

Record Some of the Data Available


Employee records Name, address, social security number
Production Part number, quantity produced, direct labor
records cost, material cost
Inventory records Part number, quantity in stock, reorder level,
economic order quantity
Sales records Product number, sales volume, sales volume by
region
Credit records Customer name, credit limit, accounts receivable
balance
Customer profile Age, gender, income, household size
Data Sources

Data Available From Selected Government Agencies

Government Web address Some of the Data Available


Agency
Census Bureau www.census.gov Population data, number of households,
household income
Federal Reserve www.federalreserv Data on money supply, exchange rates, discount
Board e.gov rates
Office of Mgmt. & www.whitehouse.g Data on revenue, expenditures, debt of federal
Budget ov/omb government
Department of www.doc.gov Data on business activity, value of shipments,
Commerce profit by industry
Bureau of Labor www.bls.gov Customer spending, unemployment rate, hourly
Statistics earnings, safety record
Data Acquisition Considerations

 Time Requirement
• Searching for information can be time consuming.
• Information might no longer be useful by the time
it is available.
 Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
 Data Errors
• Using any data that happens to be available or
that were acquired with little care can lead to poor
and misleading information.
Descriptive Statistics

 Descriptive statistics are the tabular, graphical, and numerical methods


used to summarize data.
 Most of the statistical information in newspapers, magazines, company
reports, and other publications consists of data that are summarized
and presented in a form that is easy to understand.
 Such summaries of data, which may be tabular, graphical, or
numerical, are referred to as descriptive statistics.
Example: Hudson Auto Repair

The manager of Hudson Auto would like to have


a better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Example: Hudson Auto Repair

 Tabular Summary (Frequencies and Percent


Frequencies)
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100
Example: Hudson Auto Repair

 Graphical Summary (Histogram)


18
16
14
Frequency

12
10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)
Numerical Descriptive Statistics

• The most common numerical descriptive statistic


is the average (or mean).
• The mean demonstrates a measure of the central
tendency, or central location of the data for a
variable.
• Hudson’s mean cost of parts, based on the 50
tune-ups studied is $79 (found by summing up the
50 cost values and then dividing by 50).
Statistical Inference

 Population: The set of all elements of interest in a


particular study.

 Sample: A subset of the population.

 Statistical inference: The process of using data obtained


from a sample to make estimates and test hypotheses about
the characteristics of a population.

 Census: Collecting data for the entire population.

 Sample survey: Collecting data for a sample.


Example: Hudson Auto Repair

1. Population
consists of all 2. A sample of 50
tune-ups. Average engine tune-ups
cost of parts is is examined.
unknown.

4. The value of the 3. The sample data


sample average is used provide a sample
to make an estimate of average cost of
the population average. $79 per tune-up.
Analytics

Analytics is the scientific process of transforming data


into insight for making better decisions.
Techniques:
 Descriptive analytics: This describes what has
happened in the past.

 Predictive analytics: Use models constructed from


past data to predict the future or to assess the impact
of one variable on another.

 Prescriptive analytics: The set of analytical


techniques that yield a best course of action.
Big data and Data Mining:

Big data: Large and complex data set.

Three V’s of Big data:


 Volume : Amount of available data
 Velocity: Speed at which data is collected and
processed
 Variety: Different data types
Data warehousing

Data warehousing is the process of capturing, storing,


and maintaining the data.
 Organizations obtain large amounts of data on a
daily basis by means of magnetic card readers, bar
code scanners, point of sale terminals, and touch
screen monitors.
 Wal-Mart captures data on 20-30 million transactions
per day.
 Visa processes 6,800 payment transactions per
second.
Data Mining

 Methods for developing useful decision-making


information from large databases.
 Using a combination of procedures from statistics,
mathematics, and computer science, analysts “mine
the data” to convert it into useful information.
 The most effective data mining systems use
automated procedures to discover relationships in
the data and predict future outcomes prompted by
general and even vague queries by the user.
Data Mining Applications

 The major applications of data mining have been


made by companies with a strong consumer focus
such as retail, financial, and communication firms.
 Data mining is used to identify related products that
customers who have already purchased a specific
product are also likely to purchase (and then pop-ups
are used to draw attention to those related products).
 Data mining is also used to identify customers who
should receive special discount offers based on their
past purchasing volumes.
Data Mining Requirements

 Statistical methodology such as multiple regression,


logistic regression, and correlation are heavily used.
 Also needed are computer science technologies
involving artificial intelligence and machine learning.
 A significant investment in time and money is
required as well.
Data Mining Model Reliability

 Finding a statistical model that works well for a


particular sample of data does not necessarily mean
that it can be reliably applied to other data.
 With the enormous amount of data available, the
data set can be partitioned into a training set (for
model development) and a test set (for validating the
model).
 There is, however, a danger of overfitting the model
to the point that misleading associations and
conclusions appear to exist.
 Careful interpretation of results and extensive testing
is important.
Ethical Guidelines for Statistical Practice

 In a statistical study, unethical behavior can take a


variety of forms including:
• Improper sampling
• Inappropriate analysis of the data
• Development of misleading graphs
• Use of inappropriate summary statistics
• Biased interpretation of the statistical results
 One should strive to be fair, thorough, objective, and
neutral as you collect, analyze, and present data.
 As a consumer of statistics, one should also be aware
of the possibility of unethical behavior by others.
Ethical Guidelines for Statistical Practice

 The American Statistical Association developed the report


“Ethical Guidelines for Statistical Practice”.
 It contains 67 guidelines organized into 8 topic areas:
• Professionalism
• Responsibilities to Funders, Clients, Employers
• Responsibilities in Publications and Testimony
• Responsibilities to Research Subjects
• Responsibilities to Research Team Colleagues
• Responsibilities to Other Statisticians/Practitioners
• Responsibilities Regarding Allegations of Misconduct
• Responsibilities of Employers Including
Organizations, Individuals, Attorneys, or Other
Clients
End of Chapter 1

You might also like