Introduction to
Statistics
     Sayeeda Jahan
     Spring 2021
                     Chapter 1
                  Data and Statistics
   Applications in Business and Economics
   Data
   Data Sources
   Descriptive Statistics
   Statistical Inference
    What is Statistics
•    The term statistics can refer to numerical facts such as
     averages, medians, percentages, and maximums that help
     us understand a variety of business and economic
     situations.
•    Statistics can also refer to the art and science of collecting,
     analyzing, presenting, and interpreting data.
                     Applications in
                 Business and Economics
   Accounting
    Public accounting firms use statistical sampling procedures
    when conducting audits for their clients.
   Finance
    Financial analysts use a variety of statistical information,
    including price-earnings ratios and dividend yields, to guide
    their investment recommendations.
   Marketing
    Electronic point-of-sale scanners at retail checkout counters are
    being used to collect data for a variety of marketing research
    applications.
                     Applications in
                 Business and Economics
   Production
    A variety of statistical quality control charts are used to
    monitor the output of a production process.
   Economics
    Economists use statistical information in making forecasts
    about the future of the economy or some aspect of it.
   Information Systems
    A variety of statistical information helps administrators
    assess the performance of computer networks.
                         Data
   Elements, Variables, and Observations
   Scales of Measurement
   Qualitative and Quantitative Data
   Cross-Sectional and Time Series Data
                   Data and Data Sets
   Data are the facts and figures collected, analyzed, and
    summarized for presentation and interpretation.
   All the data collected in a particular study are referred
    to as the data set for the study.
       Elements, Variables, and Observations
   The elements are the entities on which data are collected.
   A variable is a characteristic of interest for the elements.
   The set of measurements collected for a particular element
    is called an observation.
   A data set with n elements contains n observations.
   The total number of data values in a data set is the number
    of elements multiplied by the number of variables.
           Data, Data Sets,
Elements, Variables, and Observations
                 Scales of Measurement
   Scales of measurement include:
     • Nominal
     • Ordinal
     • Interval
     • Ratio
   The scale determines the amount of information contained
    in the data.
   The scale indicates the data summarization and statistical
    analyses that are most appropriate.
               Scales of Measurement
   Nominal
    • Data are labels or names used to identify an
      attribute of the element.
    • A nonnumeric label or numeric code may be used.
                Scales of Measurement
   Nominal
    • Example:
        Students of a university are classified by the
        school in which they are enrolled using a
        nonnumeric label such as Business,
        Humanities, Education, and so on.
         Alternatively, a numeric code could be used for
         the school variable (e.g. 1 denotes Business, 2
         denotes Humanities, 3 denotes Education, and
         so on).
               Scales of Measurement
   Ordinal
    • The data have the properties of nominal data and
      the order or rank of the data is meaningful.
    • A nonnumeric label or a numeric code may be
      used.
                Scales of Measurement
   Ordinal
    Example:
         Students of a university are classified by their
         class standing using a nonnumeric label such as
         Freshman, Sophomore, Junior, or Senior.
         Alternatively, a numeric code could be used for
         the class standing variable (e.g. 1 denotes
         Freshman, 2 denotes Sophomore, and so on).
                Scales of Measurement
   Interval
     • The data have the properties of ordinal data and
       the interval between observations is expressed in
       terms of a fixed unit of measure.
    • Interval data are always numeric.
               Scales of Measurement
   Interval
     Example:
          Melissa has an SAT score of 1205, while Kevin
          has an SAT score of 1090. Melissa scored 115
          points more than Kevin.
               Scales of Measurement
   Ratio
    • The  data have all the properties of interval data
      and the ratio of two values is meaningful.
    • Variables  such as distance, height, weight, and
      time use the ratio scale.
    • This scale must contain a zero value that indicates
      that nothing exists for the variable at the zero
      point.
    • Ratio data are always numerical.
                 Scales of Measurement
   Ratio
     Example:
    • Melissa’s college record shows 36 credit hours earned,
      while Kevin’s record shows 72 credit hours earned.
      Kevin has twice as many credit hours earned as
      Melissa
    • Price of a book at a retail store is $ 200, while the price
      of the same book sold online is $100. The ratio property
      shows that retail stores charge twice the online price.
         Qualitative and Quantitative Data
   Data can be further classified as being qualitative or
    quantitative.
   The statistical analysis that is appropriate depends on
    whether the data for the variable are qualitative or
    quantitative.
   In general, there are more alternatives for statistical
    analysis when the data are quantitative.
                    Qualitative Data
   Qualitative data (also known as Categorical data) are
    labels or names used to identify an attribute of each
    element.
   Qualitative data use either the nominal or ordinal
    scale of measurement.
   Qualitative data     can    be   either   numeric    or
    nonnumeric.
   The statistical analysis for qualitative data are rather
    limited.
                  Quantitative Data
   Quantitative data indicate either how many or how
    much.
    • Quantitative   data that measure how many are
      discrete.
    • Quantitative  data that measure how much are
      continuous because there is no separation between
      the possible values for the data..
   Quantitative data are always numeric.
   Ordinary arithmetic operations are meaningful only
    with quantitative data.
                  Scales of Measurement
                                               Data
                 Categorical                                               Quantitative
                                Non-
      Numeric                                                Numeric
                               numeric
Nominal     Ordinal      Nominal     Ordinal          Interval         Ratio
       Cross-Sectional and Time Series Data
   Cross-sectional data are collected at the same or
    approximately the same point in time.
    • Example:    data detailing the number of building
      permits issued in June 2020 in each of the districts of
      Dhaka.
   Time series data are collected over several time
    periods.
    • Example:    data detailing the number of building
      permits issued in Dhaka City in the last 36 months
           Time Series Data
Graph of Time Series Data
                        Data Sources
   Existing Sources
    • Data needed for a particular application might
      already exist within a firm. Detailed information
      is often kept on customers, suppliers, and
      employees for example.
           – Internal company records, Business database services, etc.
    • Substantial   amounts of business and economic
      data are available from organizations that
      specialize in collecting and maintaining data.
           – Government agencies, Industry association, etc.
                      Data Sources
   Existing Sources
     • Government agencies are another important
       source of data.
     • Data are also available from a variety of industry
       associations and special-interest organizations.
                     Data Sources
   Internet
     • The Internet has become an important source of
       data.
     • Most government agencies, like the Bureau of the
       Census (www.census.gov), make their data
       available through a web site.
     • More and more companies are creating web sites
       and providing public access to them.
     • A number of companies now specialize in making
       information available over the Internet.
                          Data Sources
   Statistical Studies
    • Statistical studies can be         classified   as   either
       experimental or observational.
    • In experimental studies the variables of interest are first
       identified. Then one or more factors are controlled so
       that data can be obtained about how the factors
       influence the variables.
    • In observational (nonexperimental) studies no attempt
       is made to control or influence the variables of interest;
       an example is a survey.
                           Data Sources
Data Available From Internal Company Records
Record              Some of the Data Available
Employee records Name, address, social security number
Production          Part number, quantity produced, direct labor
records             cost, material cost
Inventory records   Part number, quantity in stock, reorder level,
                    economic order quantity
Sales records       Product number, sales volume, sales volume by
                    region
Credit records      Customer name, credit limit, accounts receivable
                    balance
Customer profile    Age, gender, income, household size
                              Data Sources
     Data Available From Selected Government Agencies
 Government          Web address               Some of the Data Available
   Agency
 Census Bureau      www.census.gov         Population data, number of households,
                                                     household income
 Federal Reserve    www.federalreserv   Data on money supply, exchange rates, discount
     Board               e.gov                             rates
Office of Mgmt. &   www.whitehouse.g    Data on revenue, expenditures, debt of federal
      Budget           ov/omb                           government
 Department of        www.doc.gov        Data on business activity, value of shipments,
  Commerce                                            profit by industry
Bureau of Labor       www.bls.gov       Customer spending, unemployment rate, hourly
   Statistics                                      earnings, safety record
          Data Acquisition Considerations
   Time Requirement
     • Searching for information can be time consuming.
     • Information might no longer be useful by the time
       it is available.
   Cost of Acquisition
     • Organizations often charge for information even
       when it is not their primary business activity.
   Data Errors
     • Using any data that happens to be available or
       that were acquired with little care can lead to poor
       and misleading information.
                      Descriptive Statistics
   Descriptive statistics are the tabular, graphical, and numerical methods
    used to summarize data.
   Most of the statistical information in newspapers, magazines, company
    reports, and other publications consists of data that are summarized
    and presented in a form that is easy to understand.
   Such summaries of data, which may be tabular, graphical, or
    numerical, are referred to as descriptive statistics.
         Example: Hudson Auto Repair
       The manager of Hudson Auto would like to have
a better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
   91    78   93   57    75   52    99   80   97   62
   71    69   72   89    66   75    79   75   72   76
   104   74   62   68    97   105   77   65   80   109
   85    97   88   68    83   68    71   69   67   74
   62    82   98   101   79   105   79   69   62   73
          Example: Hudson Auto Repair
   Tabular Summary (Frequencies and Percent
    Frequencies)
            Parts                     Percent
           Cost ($)      Frequency   Frequency
            50-59             2           4
            60-69            13          26
            70-79            16          32
            80-89             7          14
            90-99             7          14
           100-109            5          10
                      Total 50          100
                     Example: Hudson Auto Repair
   Graphical Summary (Histogram)
                18
                16
                14
    Frequency
                12
                10
                8
                6
                4
                2
                                                           Parts
                      50   60   70   80   90   100   110   Cost ($)
     Numerical Descriptive Statistics
• The most common numerical descriptive statistic
  is the average (or mean).
• The mean demonstrates a measure of the central
  tendency, or central location of the data for a
  variable.
• Hudson’s mean cost of parts, based on the 50
  tune-ups studied is $79 (found by summing up the
  50 cost values and then dividing by 50).
                  Statistical Inference
 Population: The set of all elements of interest in a
  particular study.
 Sample: A subset of the population.
 Statistical inference: The process of using data obtained
  from a sample to make estimates and test hypotheses about
  the characteristics of a population.
 Census: Collecting data for the entire population.
 Sample survey: Collecting data for a sample.
     Example: Hudson Auto Repair
   1. Population
    consists of all       2. A sample of 50
 tune-ups. Average        engine tune-ups
   cost of parts is          is examined.
     unknown.
  4. The value of the     3. The sample data
sample average is used     provide a sample
to make an estimate of       average cost of
the population average.     $79 per tune-up.
                        Analytics
Analytics is the scientific process of transforming data
into insight for making better decisions.
Techniques:
 Descriptive analytics: This describes what has
   happened in the past.
   Predictive analytics: Use models constructed from
    past data to predict the future or to assess the impact
    of one variable on another.
   Prescriptive analytics: The set of analytical
    techniques that yield a best course of action.
            Big data and Data Mining:
Big data: Large and complex data set.
Three V’s of Big data:
 Volume : Amount of available data
 Velocity: Speed at which data is collected and
  processed
 Variety: Different data types
                Data warehousing
Data warehousing is the process of capturing, storing,
and maintaining the data.
 Organizations obtain large amounts of data on a
  daily basis by means of magnetic card readers, bar
  code scanners, point of sale terminals, and touch
  screen monitors.
 Wal-Mart captures data on 20-30 million transactions
  per day.
 Visa processes 6,800 payment transactions per
  second.
                     Data Mining
   Methods for developing useful decision-making
    information from large databases.
   Using a combination of procedures from statistics,
    mathematics, and computer science, analysts “mine
    the data” to convert it into useful information.
   The most effective data mining systems use
    automated procedures to discover relationships in
    the data and predict future outcomes prompted by
    general and even vague queries by the user.
              Data Mining Applications
   The major applications of data mining have been
    made by companies with a strong consumer focus
    such as retail, financial, and communication firms.
   Data mining is used to identify related products that
    customers who have already purchased a specific
    product are also likely to purchase (and then pop-ups
    are used to draw attention to those related products).
   Data mining is also used to identify customers who
    should receive special discount offers based on their
    past purchasing volumes.
              Data Mining Requirements
   Statistical methodology such as multiple regression,
    logistic regression, and correlation are heavily used.
   Also needed are computer science technologies
    involving artificial intelligence and machine learning.
   A significant investment in time and money is
    required as well.
            Data Mining Model Reliability
   Finding a statistical model that works well for a
    particular sample of data does not necessarily mean
    that it can be reliably applied to other data.
   With the enormous amount of data available, the
    data set can be partitioned into a training set (for
    model development) and a test set (for validating the
    model).
   There is, however, a danger of overfitting the model
    to the point that misleading associations and
    conclusions appear to exist.
   Careful interpretation of results and extensive testing
    is important.
      Ethical Guidelines for Statistical Practice
   In a statistical study, unethical behavior can take a
    variety of forms including:
     • Improper sampling
     • Inappropriate analysis of the data
     • Development of misleading graphs
     • Use of inappropriate summary statistics
     • Biased interpretation of the statistical results
   One should strive to be fair, thorough, objective, and
    neutral as you collect, analyze, and present data.
   As a consumer of statistics, one should also be aware
    of the possibility of unethical behavior by others.
      Ethical Guidelines for Statistical Practice
   The American Statistical Association developed the report
    “Ethical Guidelines for Statistical Practice”.
   It contains 67 guidelines organized into 8 topic areas:
     • Professionalism
     • Responsibilities to Funders, Clients, Employers
     • Responsibilities in Publications and Testimony
     • Responsibilities to Research Subjects
     • Responsibilities to Research Team Colleagues
     • Responsibilities to Other Statisticians/Practitioners
     • Responsibilities Regarding Allegations of Misconduct
     • Responsibilities of Employers Including
        Organizations, Individuals, Attorneys, or Other
        Clients
End of Chapter 1