SAMPLING DESIGN: BASIC CONCEPTS
AND PROCEDURE
      The goal of sampling is to obtain individuals for a
study in such a way that accurate information about the
population can be obtained.
                 Two Types of Samples
     1.    Probability Sample
     2.    Non-probability Sample                                3.   Preplan how to select a sequence of digits
                                                                      from the table so that no bias enters the
              PROBABILITY SAMPLES                                     selection process.
                                                                 4.   Select a random number in the preplanned
                                                                      pattern.
          Samples are obtained using some objective
                                                                 5.   Arrange the random numbers consecutively in
           chance     mechanism,        thus     involving            numerical order.
           randomization.                                        6.   Select as samples those items in the lot
          They require the use of complete listing of the            corresponding to the random numbers.
           elements of the universe called the sampling
           frame.                                            SYSTEMATIC RANDOM SAMPLING (SysRS)
          The probabilities of selection are known.
          They are generally referred to as random                    It is obtained by selecting every kth
           samples.                                                     individual from the population.
                                                                       The first individual selected corresponds to a
           NON-PROBABILITY SAMPLES                                      random number between 1 to k.
          Samples are obtained haphazardly, selected
           purposively, or are taken as volunteers.
          The probabilities of selection are unknown.
          They should NOT be used for statistical
           inference.
          The result from the use of judgment sampling,
           accidental sampling, purposively sampling,
           and the like. [these are examples of sampling
           that established non-probability]
          BASIC SAMPLING TECHNIQUE OF
                                                             PROCEDURES IN OBTAINING SysRS
             PROBABILITY SAMPLING
                                                                 1.   Decide on a method of assigning a unique
SIMPLE RANDOM SAMPLING (SRS)                                          serial number, from 1 to N, to each one of the
                                                                      elements in the population.
           The most basic method of drawing a                   2.   Compute for the sampling interval
            probability sample.
           Assign equal probabilities of selection to                      N
            each possible sample.                                     k=      =Population ¿ ¿ Sample ¿ ¿ ¿ ¿
           Results to a simple random sample.
                                                                            n
                                                                 3.   Select a number, from 1 to k, using a
                                                                      randomization mechanism, The element in the
                                                                      population assigned to this number is the first
                                                                      element of the sample. The other elements of
                                                                      the sample are those assigned to the numbers
                                                                      and so on until you get a sample of size.
PROCEDURES IN OBTAINING SRS                                  Example:
     1.    Assign a number to each item in the lot.               We want to select a sample of 50 students from
     2.    Consult the table of random numbers.              500 students under this method kth item and picked up
                                                             from the sampling frame.
Solution:                                                               The clusters are constructed such that the
                                                                         sampling units are heterogeneous within
                                                                         cluster and homogeneous among the
                                                                         clusters.
     We start to get a sample starting from i and for
every kth unit subsequently. Suppose the random
number i is 5, then we select 5, 15, 25, 35, …
STRATIFIED RANDOM SAMPLING (StRS)
           It is obtained by separating the population
            into non-overlapping groups called strata
            and then obtaining a simple random sample
            from each stratum.
           The individuals within each stratum should
            be homogeneous (or similar) in some way.
                                                              Example:
                                                                     The list of all the agricultural farms in a village or
                                                              district may not be easily available but the list of village
                                                              or district is generally available. In this case, every farm
                                                              in sampling unit and every village or district is the
                                                              cluster.
                                                              MULTI-STAGE SAMPLING (MSS)
Example:
      A sample of 50 students is to be drawn from a
population consisting of 500 students belonging to the
two institutions A and B. The number of students in
institution A is 200 and the institution B is 300. How
will you draw the sample using proportional allocation?
Solution:
   There are two strata in this case.
   Given: N1 = 200 N2 = 30        N = 500    n = 50
                                                              PROCEDURE IN OBTAINING MSS
     If n1 and n2 are the sample size,
                                                                   1.   Organize the sampling process into stages
                                                                        where the unit of analysis is systematically
                                                                        grouped.
                                                                   2.   Select a sampling technique for each stage.
                                                                   3.   Systematically apply the sampling technique
                                                                        to each stage until the unit of analysis has
      The sample sizes are 20 from A and 30 from B.                     been selected.
Then the units from each institution are to be selected
by simple random sampling.
CLUSTER SAMPLING
           It is a way to randomly select participants
            from a list that is too large for simple random
            sampling.
                                                                    -    Interviewer Error
                                                                    -    Misrepresented Answers
                                                                    -    Data Entry Errors
                                                                    -    Questionnaire Design
                                                                    -    Wording of Questions
                                                                    -    The order of the questions, words, and
                                                                         responses.
                                                               2.   Sampling Error is the error that results from
                                                                    using the sampling to estimate information
                                                                    regarding a population.
   BASIC SAMPLING TECHNIQUE OF NON-                                       PRESENTATION OF DATA
         PROBABILITY SAMPLING
                                                               1.   Textual Presentation
                                                               2.   Tabular Presentation
         Accidental Sampling
                                                               3.   Graphical Presentation (bar graph, histogram,
         Quota Sampling
                                                                    pie chart, etc.)
         Convenience Sampling
         Purposive Sampling                                    MEASURE OF CENTRAL TENDENCY
         Judgment Sampling
     CASES WHERE IN NON-PROBABILITY                         MEAN
           SAMPLING IS USEFUL
                                                                       It is the sum of the data values divided by the
         Only few are willing to be interviewed.                       number of data values.
         Extreme difficulties in locating or identifying              It is also called the average.
          subjects.                                                    It is appropriate only for data under interval
         Probability sampling is more expensive to                     and ratio scale measurement.
          implement.
                                                            ADVANTAGE OF MEAN
Exercise: Identify the sample selection procedure used
in each of the following cases:                                        Simple to understand and easy to calculate.
                                                                       It is rigidly defined.
    1.   A tax auditor selects every 1,000th income tax                It is least affected fluctuation of sampling.
         return that is received.                                      It considers all the values in the series.
                                                            MEDIAN
    2.   12 people are randomly selected to serve as                   It is the “middle observation” when the data
         jurors from a jury pool of 150 people.                         set is sorted (in either increasing or
                                                                        decreasing order)
                                                                       The median divides the distribution into two
                                                                        equal parts.
    3.   To select a sample household in a province, a
                                                            ADVANTAGE OF MEDIAN
         sample of provinces were selected, then a
         sample of municipalities were chosen from                     The median is not affected by the size of
         each of the selected provinces, then a sample                  extreme values but by the number of
         of barangays were chosen from each of the                      observations.
         selected municipality, and all households in                  The median can be calculated even when the
         the selected barangays were included.                          frequency distribution contains “open-ended”
                                                                        intervals.
                                                                       It can also be used to define the middle of a
                                                                        number of objects, properties, or quantities
                                                                        which are not really quantitative in nature.
          SOURCES OF ERRORS IN SAMPLING                                It can be easily interpreted.
                                                            MODE
    1.   Non-sampling Error are errors that results
         from the survey process.                                      It is the most frequently occurring value in a
         This includes…                                                 list of data.
         -   Non-responses                                             It is sometimes called nominal average.
         It is appropriate measure of the average for           Quantiles are statistics that describe various
          data using the nominal scale of measurement,      subdivisions of a frequency distribution into equal
                                                            proportions.
ADVANTAGE OF MODE
                                                                      THREE SPECIAL QUANTILES
         The mode is easy to understand.
         Like the median, it is not greatly affected by        1.   Quartiles
          extreme values.                                       2.   Deciles
         Like the median, it can be computed even              3.   Percentiles
          when the frequency distribution contains
                                                            QUARTILES
          “open-ended” intervals.
                                                                  Descriptive measures that split the ordered data
Remember:
                                                            into four quarters.
    Whenever you hear the word average, be aware that
the word may not always be referring to the mean. One
average could be used to support one position, while
another average could be used to support a different
position.
            MEASURE OF DISPERSION
                                                            DECILES
                                                                   Descriptive measures that split the ordered data
                                                            into ten equal parts.
                                                            PERCENTILES
      Since the data points in figure 2 is more scattered
than the data points in figure 1, then the data set            Descriptive measures that split the ordered data into
depicted in figure 2 is more varied.                        100 equal parts.
RANGE
         It is the difference between the largest and
          the smallest observations or items in a set of
          data.
STANDARD DEVIATION
         It is a measure of how far away items in a        Example: Interpretation using Quantiles
          data set are from the mean.
         The larger the standard deviation, the more           1.   Jennifer just received the results of her SAT
          variation there is in the data set.                        exam. Her SAT Mathematics score of 600 is
                                                                     in the 74th percentile. What does this
VARIANCE                                                             mean?
         It represents all data points in a set and is              A percentile rank of 74% means that 74% of
          calculated by averaging the squared deviation              SAT Mathematics scores are less than or equal
          of each mean.                                              to 600 and 26% of the scores are greater. So,
                                                                     26% of the students who took the exam scored
      MEASURE OF RELATIVE POSITION                                   better than Jennifer.
                                                                2.   A test mark is calculated to be at the 84th
                                                                     percentile, what does this mean?
          84% of the people who wrote the test got the         A parameter is a numerical characteristic of the
          same mark or less than the test mark and 16%     population. Any characteristics of a population are
          of the people who wrote the test scored higher   called a parameter.
          than the test mark.
    3.    Time taken to finish a test is 35 minutes.          A statistic is a numerical value that describes a
          This time was the first quartile. What does      sample, or a number computed from the sample data.
          this mean?
                                                           WHAT PROPERTIES MAKE A GOOD POINT
                                                           ESTIMATOR?
          25% of the learners finished the exam in 35
          minutes or less, and 75% of the learners              1.   It's desirable that the sampling distribution be
          finished the exam in more than 35 minutes.                 centered around the true population
                                                                     parameter. An estimator with this property is
            PARAMETRIC STATISTICS                                    called unbiased.
                                                                2.   It's desirable that our chosen estimator have a
         Parametric statistical procedures are                      small standard error in comparison with
          inferential procedures that rely on testing                other estimators we might have chosen.
          claims regarding parameters such as the
          population mean, the population standard
          deviation, or the population proportion.
         In some circumstances, the use of parametric
          procedures requires that certain requirements                 CONFIDENCE INTERVAL
          regarding the distribution of the population,
          such as normality, be satisfied.
                                                           Confidence interval provides more information than
         Assume underlying statistical distributions in   point estimates and it consist of an interval of numbers.
          the data. Therefore, several conditions of
          validity must be met so that the result of a     Level of confidence represents the expected proportion
          parametric test is reliable.                     of intervals that will contain the parameter if a large
         Apply to data in ratio scale, and some apply     number of different samples is obtained.
          to data in interval scale.
                                                           The level of confidence is denoted by,
      TWO COMMON FORMS OF STATISTICAL
               INFERENCE                                                        1 − 𝛼 × 100%
     1.   Estimation                                       Confidence interval estimates are of the form Point
     2.   Hypothesis Testing                               estimate margin of error.
          ESTIMATING THE VALUE OF A
      In statistics, estimate is used to approximate the
value of an unknown population parameter.
          TWO TYPES OF ESTIMATION
                                                                           MARGIN OF ERROR
    1.    POINT ESTIMATION – (single points that
          are used to infer parameters directly).
                                                               The margin of error of the estimate can be
    2.    INTERVAL ESTIMATION – (also called
                                                           computed using this formula:
          confidence interval for parameter).
           PARAMETER VS STATISTIC
                                                                The margin of error of a confidence interval
                                                           estimate of a parameter depends on three factors:
                                                                1.   Level of Confidence
                                                                2.   Sample Size
                                                                3.   Standard Deviation
      INTERPRETATION OF CONFIDENCE
                INTERVAL
A 1 − 𝛼 × 100% confidence interval indicates that, if we
obtained many simple random samples of size n from
the population whose mean, is unknown, then
approximately of the intervals will contain.
In other words,
    “We are (insert level of confidence) confident that    Note:
the population mean is between (lower bound) and
(upper bound). This is an abbreviated way of saying        If the sample size is large (n ≥ 30), then the sample
the method is correct 1 − 𝛼 × 100% of the time”.           standard deviations can be used to estimate the
                                                           population standard deviation.
Example:
If we constructed a 90% confidence interval with a
lower bound of 12 and an upper bound of 18, we would
interpret the intervals as follows:
“We are 90% confident that the population mean, is
between 12 and 18”.
Remember:
A 95% confidence interval does not mean that there is
95% probability that the interval contains population      How about if  known but n < 30? Use Case 1.
mean.
                                                           Example:
           ESTIMATING THE VALUE OF A
                                                               How much do Filipinos sleep each night? Based on
         PARAMETER USING CONFIDENCE
                                                           a random sample of 1120 Filipinos 15 years of age or
                  INTERVALS
                                                           older, the mean amount of sleep per night is 8.17 hours
    1.    Constructing confidence intervals about a        according to the Filipino Time.
          population mean where the population
          standard deviation is (known or unknown).            Use Survey conducted by the Bureau of Labor
    2.    Constructing confidence intervals about a        Statistics. Assuming the population standard deviation
          population proportion.                           for amount of sleep per night is 1.2 hours, construct and
    3.    Constructing confidence intervals about a        interpret a 95% confidence interval for the mean
          population standard deviation.                   amount of sleep per night of Filipinos 15 years of age or
                                                           older.
        CONFIDENCE INTERVAL ABOUT                          Solution:
      POPULATION MEAN where population
   standard deviation is KNOWN or UNKNOWN
                                                           Example:
    A simple random sample of size n = 40 is drawn
from a population. The sample mean is found to be 20.1,
and the sample standard deviation is found to be 3.2.
Construct and interpret a 90% confidence interval
about the population mean.
Solution:
                                                           “We are 95% confident that the proportion of Filipinos
        CONFIDENCE INTERVAL ABOUT                          who are in favor of tighter enforcement of government
         POPULATION PROPORTION                             rules on TV content during hours when children are
                                                           most likely to be watching is between 0.73 and 0.77”.
   The point estimate for the population proportion is,
                                                                   CONFIDENCE INTERVAL ABOUT
                                                                     POPULATION VARIANCE
                                                               If a simple random sample of size n is taken from a
where x is the number of individuals in the sample with    normal population with mean and standard deviation ,
the specified characteristic and n is the sample size.     then a confidence interval about (1 − 𝛼 × 100%) is
                                                           given by,
    Suppose a simple random sample of size n is taken
from a population. A confidence interval for p is given
by the following quantities:
                                                           With n – 1 degrees of freedom.
                                                           Remember:
Note:                                                          A confidence interval about the population
                                                           variance or standard deviation is not of the form “point
     It must be the case that 𝑛𝑝(1 − 𝑝) ≥ 10 and 𝑛 ≤       estimate margin of error” because the sampling
0.05N to construct this interval.                          distribution of the sample variance is not symmetric.
Example:                                                   Example:
      In a poll conducted by the Research Center for the       A simple random sample of size n = 12 is drawn
People and the Press, a simple random sample of 1505       from a population that is normally distributed. The
Filipino adults was asked whether they were in favor of    sample variance is found to be 𝑠2 = 23.7. Construct a
tighter enforcement of government rules on TV content      90% confidence interval about the population
during hours when children are most likely to be           variance.
watching.
                                                           Solution:
      Of the 1,505 adults, 1,129 responded yes.
Obtained a 95% confidence interval for the proportion
of Filipinos who are in favor of tighter enforcement of
government rules on TV content during hours when
children are most likely to be watching.
Solution:
Exercises:
    1.   Jane wants to estimate the proportion of
         students on her campus who eat cauliflower.
         After surveying 20 students, she finds 2 who
         eat cauliflower. Obtain and interpret a 95%
         confidence interval for the proportion of
         students who eat cauliflower on Jane’s
         campus.
    2.   Alan wants to estimate the proportion of
         adults who walk to work. In a survey of 10
         adults, he finds 1 who walk to work. Obtain
         and interpret a 95% confidence interval for the
         proportion of adults who walk to work.
    3.   Suppose a sample of 30 Stats students are
         given an IQ test. If the sample has a standard
         deviation of 12.23 points, find a 90%
         confidence interval for the population
         standard deviation and interpret the result.
         [provided this space as your answer sheet]
                      HYPOTHESIS TESTING                                      Denoted by Ha.
                                                                              Statement that must be true if the null
                                                                               hypothesis is false.
             Hypothesis testing is a procedure on sample
                                                                              Sometimes referred to as the research
              evidence and probability, used to test claims
                                                                               hypothesis.
              regarding a characteristic of one or more
                                                                              Must contain the condition of equality and
              populations.
                                                                               must be written with the symbol ≠, <, or >.
             A statement or claim regarding a
              characteristic of one or more populations.        Example:
             A preconceived idea, assumed to be true but
              has to be tested for its truth or falsity.                      Students who eat breakfast will perform
                                                                               better on a math exam than students who do
Example:                                                                       not eat breakfast.
                                                                              Students who experience test anxiety prior to
             The mean body temperature for patients
                                                                               an English exam will get higher scores than
              admitted to elective surgery is not equal to
                                                                               students who do not experience test anxiety.
              37.0 oC.
                                                                              Motorists who talk on the phone while
             A consumer advocate would like to know if
                                                                               driving will be more likely to make errors on
              the mean lifetime of a bulb is less than 500
                                                                               a driving course than those who do not talk
              hours.
                                                                               on the phone.
             A real estate broker believes that because of
              changes in interest rates, as well as other       Remember:
              economic factors, the mean price has
              increased since then.                                 If you are conducting a research study and you want
                                                                to use a hypothesis test to support your claim, the claim
  PROCEDURES FOR HYPOTHESIS TESTING                             must be stated in such a way that it becomes the
                                                                alternative hypothesis, so it cannot contain the
                                                                condition of equality.
      1.     State the null and alternative hypothesis.
      2.     Set the level of significance or alpha level (α)           TWO TYPES OF ALTERNATIVE TEST
      3.     Determine the test distribution to use.
      4.     Determine the critical region.                            1.     One-Tailed Test
      5.     State the decision rule.                                         -   Left Tailed
      6.     Calculate a test statistic.                                      -   Right Tailed
      7.     Make statistical decision.                                2.     Two-Tailed Test
 1.        State the Null and Alternative Hypothesis
Null Hypothesis
             Denoted by 𝐻o.
             The statement being tested.
             Assumed true until evidence indicates
              otherwise.
             Must contain the condition of equality and
              must be written with the symbol =, ≤, or ≥.
Example:                                                          2.        Set the Level of Significance or Alpha Level
                                                                            (α)
             Students who eat and not eat breakfast will
              perform the same on a math exam.
                                                                   The level of significance, 𝛼, is the probability of
             Students who experience and not experience
                                                                making a type I error.
              test anxiety prior to an English exam will get
              the same scores.                                                    TWO TYPES OF ERROR
             Motorists who talk and not talk on the phone
              while driving will get the same errors on a
              driving course.
Alternative Hypothesis
[Let null = tao. TYPE I ERROR kapag ni-reject mo
‘yung tamang tao. TYPE II ERROR kapag in-accept
mo ‘yung maling tao]
Example:
             𝐻o:      The defendant is innocent.
             𝐻a:      The defendant is not innocent.
What happen to the defendant if the jury made type I
and type II error?
                                                                         Rejection of region or critical region is the
            A type I error is like putting an innocent                   set of all values of the test statistic which will
             person in jail.                                              lead to the rejection of 𝐻o.
            A type II error is like letting a guilty person             Acceptance Region is the set of all values of
             go free.                                                     the test statistic that leads the researcher to
                                                                          retain 𝐻o.
Remember:
     It is important to note that we want to set α before        5.   State the Decision Rules
we start our study because the Type I error is the more
‘severe’ error to make. The smaller α is, the smaller          USING CONFIDENCE INTERVAL
the region of rejection.
                                                               Decision Rule: Reject the null hypothesis if the test
  3.    Determine the Test Distribution to Use                 statistic is NOT within the range specified by the
                                                               confidence interval.
     Determine the best statistical test to be use, based
on the objective, and the assumptions that are satisfied.
       LIST OF COMMON PARAMETRIC TEST
                                                               USING p-value APPROACH
       1.    One Sample z-Test
       2.    One Sample t-Test                                 Decision Rule: Reject the null hypothesis if the
       3.    One Sample Proportion Test                        computed p-value is less than or equal to the set
       4.    Independent Sample z-Test                         significance level, otherwise do not reject the null
       5.    Independent Sample t-Test                         hypothesis.
       6.    Two Sample Proportion Test
                                                                           Reject NULL if p < alpha level
       7.    Paired Sample t-Test
       8.    Analysis of Variance (ANOVA) Test                 [not advisable to reject null even if p = alpha level]
       9.    Tukey Test (Post Hoc ANOVA)
       10.   Two Way ANOVA                                     USING TRADITIONAL METHOD
       11.   Pearson Product Moment Correlation
       12.   Regression Analysis                               Decision Rule: Reject 𝐻o if the computed value of the
                                                               test statistic falls in the region of rejection.
[the following test distribution will be discussed on
latter part of the lecture (sana)]                             [see the diagram in number 4]
  4.    Determine the Critical Region                            6.   Calculate Test Statistic
                                                                    Once you determine the appropriate statistical test
                                                               to be used on step no. 3, calculate the test statistic. The
                                                               value computed using different statistical test is used
                                                               to compare to the critical value.
Test statistic - a statistic computed from the sample
data that is especially sensitive to the differences
between 𝐻o and 𝐻a.
 7.        Make Statistical Decision
             Fail to reject the null hypothesis/ Retain the
              null hypothesis/ There is no enough evidence
              to reject the null hypothesis.                   Q-Q PROBABILITY PLOTS display the observed
             Reject the null hypothesis.                      values against normally distributed data (represented by
                                                               the line).
Remember:
    It is important to recognize that we NEVER accept
the null hypothesis. We are merely saying that the
sample evidence is not strong enough to warrant
rejection of the null hypothesis.
                NORMAL DISTRIBUTION
                                                               Remember:
                                                                  Graphical methods are typically not very useful
                                                               when the sample size is small.
                                                                                   NUMERICAL
                                                                   The following tests are the common statistical test
    This graph is called the normal curve, which is
                                                               for normality.
bell-shaped curve, and which approximately describes
many phenomena that occur in nature, industry, and             KOLMOGOROV SMIRNOV TEST
research.
                                                                        It was first derived by Kolmogorov (1933)
           PROPERTIES OF NORMAL CURVE                                    and later modified and proposed as a test by
                                                                         Smirnov (1948). The test is non-parametric
                                                                         and entirely agnostic (uncertain) to what this
      1.     The normal curve is bell-shaped and
                                                                         distribution actually is.
             symmetric about the mean.
      2.     The mean, median and mode are equal.                       This test has been shown to be less powerful
      3.     The total area under the curve is equal to one.             than the other tests in most situations. It is
      4.     The normal curve approaches, but never                      included only because of its historical
             touches the x-axis as it extends farther and                popularity. Some published articles would
             farther away from the mean.                                 say “The Kolmogorov-Smirnov test is only a
                                                                         historical curiosity. It should never be used."
                                                                        Tie scores should not be present in the data.
            TESTING NORMALITY OF DATA
                                                               LILLIEFORS TEST
    To determine if the data is following a normality
distribution, we can use the graphical or numerical                     Adaptation of the Kolmogorov - Smirnov
method.                                                                  Test for the case when the mean and
                                                                         variance of the normal distribution is
                       GRAPHICAL                                         UNKNOWN.
                                                                        It is also use as correction for Kolmogorov -
                                                                         Smirnov Test since the parameters of 𝐶𝐷𝐹
HISTOGRAM plots the observed values against their                        are estimated from the sample, the test
frequency, states a visual estimation whether the                        becomes conservative and loses power.
distribution is bell shaped or not.
                                                               ANDERSON-DARLING TEST
          It is a modified Kolmogorov-Smirnov test,
           but more weight to the tails of the
           distribution is given.
          This test, developed by Anderson and
           Darling (1954), is a popular among those
           tests that are based on EDF statistics.
           [empirical cumulative distribution function]
                                                                               NORMAL Q-Q PLOT
SHAPIRO-WILK TEST
          One of the MOST POPULAR TESTS for
           normality assumption diagnostics which has
           good properties of power [sensitivity of
           hypothesis test ] and it based on correlation
           within given observations and associated
           normal scores.
          The Shapiro-Wilk test statistic is derived by
           Shapiro and Wilk (1965)
          Doesn’t work well if several values in the
           data set are the same/tie scores occur in the
           data.
                                                                                 HISTOGRAM
      HYPOTHESIS OF NORMALITY TEST
                                                                              NUMERICAL METHOD
𝐻o:   The sample data follows a normal distribution.
                                                                Test Method      P-Value     Decision          Remarks
                                                              Kolmogorov-       < 0.000      Reject Ho        Not Normal
𝐻a:   The sample data does not follow a normal
                                                              Smirnov
      distribution.                                           Lilliefors        0.0571     Failed to reject    Normal
                                                                                                 Ho
When we are testing normality:                                Anderson-         0.2178     Failed to reject
                                                                                                 Ho
                                                                                                               Normal
                                                              Darling
          If P value > alpha, it means that the data are     Shapiro-Wilk      0.2804     Failed to reject
                                                                                                 Ho
                                                                                                               Normal
           normal.
          If P value ≤ alpha, it means that the data are
           NOT normal.                                              ONE SAMPLE HYPOTHESIS TEST
Example:
                                                                  On the previous lecture, it showed some examples
     Construct a graphical and numerical method in           of parametric tests, and one might observe that there
testing the normality of these data. Diameters of 36 rivet   were such ONE SAMPLE and INDEPENDENT
heads in 1/100 of an inch.                                   SAMPLE. [ano ang pinagkaiba?]
                                                                 In this lecture, we are much more concerned on
                                                             some test statistics which involves POPULATION
MEAN and POPULATION PROPORTION on one                               If the null hypothesis can’t be accepted, then
sample. [one group compared to standard group]                       the conclusion is simply that the population
                                                                     mean doesn’t equal the assumed value. It
    TEST CONCERNING THE POPULATION                                   doesn’t matter if the true value is likely to be
                MEAN                                                 more or less than the assumed value.
    ONE-SAMPLE Z-TEST and ONE-SAMPLE T-                             A two-tailed test is the one that rejects the
TEST is used to compare the mean of one sample to a                  null hypothesis if the sample statistic is
known standard (theoretical/hypothetical) mean (𝜇0).                 significantly higher or lower than the
                                                                     assumed value of the population parameter.
ASSUMPTIONS                                                         In a one-tailed test, there is only one rejection
                                                                     region, and the null hypothesis is rejected
     1.   The sample is obtained using simple random                 only if the value of a sample statistic falls
          sampling or from a randomized experiment.                  into the single rejection region.
     2.   The population from which the data is
          sampled is normally distributed.
HYPOTHESES
                                                          Tabulated z-values for the common choices of α
Note: 𝜇0 is a specified value of the population mean.
                  ONE SAMPLE z-TEST
CASE 1: Testing means of a normal population with
known 
Test Statistic:
CASE 2: Large sample tests for means with unknown
[If  is unknown and n > 30, use the z-test but replace
the  with s]
Test Statistic:
Note:
                                                                Does an average box of cereal contain more than
                  ONE SAMPLE t-TEST
                                                            368 grams of cereal? A random sample of 25 boxes
                                                            showed x= 372.5. The company has specified  to be
CASE 3: Small sample tests for means with unknown           15 grams. Test at the a = 0.05 level.
                                                            Solution:
[If  is unknown and n < 30, use the t-test and replace
 by s]
Test Statistic:
Rejection Region:
Example:
   Does an average box of cereal contain more than
368 grams of cereal? A random sample of 36 boxes            Example:
showed x = 372.5, and s = 15. Test at the a = 0.01 level.
                                                                Does an average box of cereal contain less than 368
Solution:                                                   grams of cereal? A random sample of 25 boxes showed
                                                            x = 372.5, and s = 15. Test at the a = 0.01 level.
                                                            Solution:
Example:
Exercise:
    Does an average box of cereal contain 368 grams of
cereal? A random sample of 25 boxes showed x = 372.5.
The company has specified  to be 15 grams. Test at the        TEST A CLAIM ABOUT A PROPORTION
a = 0.05 level.
                                                              We can test a claim about a proportion, percentage,
                                                          or probability, as illustrated in these examples:
                                                                    Based on a sample survey, fewer than ¼ of
                                                                     all college graduates’ smoke.
                                                                    The percentage of physicians leaving the
                                                                     country is equal to 15%.
                                                                    If a driver is fatally injured in a car crash,
                                                                     there is a 0.35 probability that the driver was
                                                                     legally impaired.
                                                                    ONE SAMPLE PROPORTION TEST
                                                              The One-Sample Proportion Test is used to assess
                                                          whether a population proportion (P1) is significantly
                                                          different from a hypothesized value (P0). The
                                                          hypotheses may be stated in terms of the proportions,
                                                          their difference, their ratio, or their odds ratio, but all
                                                          four hypotheses result in the same test statistics.
                                                          ASSUMPTIONS:
                                                               1.    The conditions for a binomial experiment are
                                                                     satisfied. That is, we have a fixed number of
                                                                     independent        trials   having     constant
                                                                     probabilities, and each trial has two outcome
                                                                     categories, which we classify as “success”
                                                                     and “failure”.
                                                               2.    The conditions 𝑛𝑝𝑜 ≥ 5 and 𝑛(1 − 𝑝𝑜) ≥ 5 are
                                                                     both satisfied, so the binomial distribution of
                                                                     sample proportions can be approximated by a
                                                                     normal distribution with
                                                                             µ = np and σ =√ np(1−p)
                                                          HYPOTHESES
 CONNECTION TO CONFIDENCE INTERVALS
                                                             significance to suggest that         the proportion of
                                                             housewives throughout the            city who prefer
                                                             supermarkets exceeds 40%.
                                                             Solution:
                                                             We need first to check if np ≥ 5 and np(1-p) ≥ 5 to
Note: 𝑝𝑜 is a specified value of population proportion.      determine if binomial distribution can be approximated
                                                             by the normal distribution.
REJECTION REGION
                                                             The assumption is satisfied.
TEST STATISTIC
Note:
    When conducting a test of a claim about a
population proportion p, be careful to identify
correctly the sample proportion.
                                                             Exercises:
    1.   The sample proportion p is sometimes given
         directly. (e.g., “10% of the observed sports car         1.     Kate Flower, President of Kate and Edith
         are red.” This is expressed as p = 0.10)                        Cake Company, says that the mean number of
    2.   In other cases, we may need to calculate the                    cakes sold daily is 1, 500. An employee wants
         sample proportion by using,                                     to test the accuracy of Kate's claim. A random
                                                                         sample of 36 days shows that the mean daily
                                                                         sales were 1, 450 cakes. Using a level of
                                                                         significance of 0.01 and assuming σ = 120
Example: “96 surveyed households have cable TV and
                                                                         cakes. What should the worker conclude?
54 do not,” we can first find the sample size n to be 96 +
54 = 150, then we can calculate the value of the sample
proportion of households with cable TV as follows:
Example:
    250 housewives were randomly selected and asked
whether they prefer purchasing fish from supermarkets
or from wet (public) markets. If 114 of them preferred
supermarkets, is there evidence at the 5% level of
2.   Juanita Lopez, a production supervisor at
     chemical company, wants to be sure that the
     Super-Duper can is filled with an average of
     16oz of product. If the mean volume is
     significantly less than 16 oz, customers will
     likely complain, prompting undesirable            4.   In a study of air-bag effectiveness, it was
     publicity. The physical size of the can doesn’t        found that in 821 crashes of midsize cars
     allow a mean volume significantly above 16             equipped with air bags, 46 of the crashes
     oz. A random sample of 36 cans shows a                 resulted in hospitalization of the drivers. Use a
     sample mean of 15.7 oz. Assuming σ is 0.2 oz,          0.01 level of significance to test the claim that
     conduct a hypothesis test with α = 0.01.               the airbag hospitalization rate is lower than
                                                            the 7.8% rate for crashes of midsize cars
                                                            equipped with automatic safety belts.
                                                       5.   Suppose that the teacher of a school claims
                                                            that the average weight of student population
                                                            greater than from 140 lb. and we desire to test
3.   We want to compare fasting serum cholesterol           the truth of this claim. We have a random
     levels of Filipino women to that of the                sample of 6 students of the school weights
     American women. Assume the cholesterol                 from student population. Use a 0.10 level of
     levels in 20 to 39 years old women in the              significance.
     United States in normally distributed with 𝜇 =
     90𝑚𝑔/𝑑l. Blood tests are performed on 19
     female Filipinos in this age range rendered a
     sample mean cholesterol level of 181.52
     mg/dl and standard deviation of 40 mg/dl.
     Conduct a test of hypothesis to determine
     whether Filipino women have lower average
     cholesterol level than their American
     counterparts. Use alpha = 0.05.
 CONSTRUCTING QUESTIONNAIRES
                         1. Purpose
                                      8. Pretest and
     2. Pre-existing                    Validation
     Questionnaire
                         STEPS IN
3. Domains and         CONSTRUCTI           7. Cover ltter,
    Types of                              Instructions, and
   Questions               NG                   Layout
                       QUESTIONNAI
                           RE
    4. Consider the
                                      6. Ordering
       Audience
                          5. Write
                         Questions