ENGINEERING DATA ANALYSIS
MODULE OVERVIEW
Welcome Learners! This module will introduce you to
limits of functions and the fundamental principles of
derivatives of functions. Describing differentiation
concept as a rate of change is also the focus of this
module.
This module is organized into four lessons as follows:
     o Lesson 1: Methods of Data Collection
     o Lesson 2: Planning and Conducting Survey
     o Lesson 3:Planning and Conducting an
       Experiment: Introduction to Design of
       Experiment
                  MODULE OUTCOME
      At the completion of the module, you should be able
to:
       o Distinguish appropriate data collection method for
         a particular study.
       o Identify the type of statistical design appropriate
         in a particular study.
                                                            12 | P a g e
                              ENGINEERING DATA ANALYSIS
Learning Outcome:
     o Describe data analysis process.
     o To distinguish observational studies and experimental studies.
     o To determine which type of design to be used in a particular study.
Time Frame: 2 hours
Introduction
        A primary goal of statistical studies is to collect data that can then be used to
make informed decisions. It should come as no surprise that the ability to make good
decisions depends on the quality of the information available. This lesson introduces
the data analysis process, types of data , and the different method of data collection.
        Abstraction
1.1 Data Analysis Process
      Statistics involves collecting, summarizing, and analyzing data. All three tasks are
critical. Without summarization and analysis, raw data are of little value, and even
sophisticated analyses can’t produce meaningful information from data that were not
collected in a sensible way.
Statistical studies are undertaken to answer questions about our world.
For instance:
    1. Is a new flu vaccine effective in preventing illness? Is the use of bicycle
        helmets on the rise?
    2. Are injuries that result from bicycle accidents less severe for riders who wear
        helmets than for those who do not?
    3. How many credit cards do college students have?
    4. Do engineering students pay more for textbooks than do education students?
Data collection and analysis allow researchers to answer such questions. The process
can be organized into the following six steps:
   1. Understanding the nature of the problem. Effective data analysis requires
      an understanding of the research problem. The goal of the research and what
      questions we hope to answer. It is important to have a clear direction before
      gathering data to ensure that questions of interest will be answered using the
      data collected.
                                                                             13 | P a g e
                            ENGINEERING DATA ANALYSIS
   2. Deciding what to measure and how to measure it. The next step in the
      process is deciding what information is needed to answer the questions of
      interest.
   Example 1:      In a study of the relationship between the student’s
achievement in the courses Calculus 1 and English, you would need to collect
data test scores in Calculus 1 and in English.
     Example 2: In a study of the relationship between preferred learning style
and intelligence of first year engineering students. How would you define
learning style and measure it and what measure of intelligence would you use?
    It is important to carefully define the variables to be studied and to develop
   appropriate        methods        for      determining        their      values.
   3. Data collection. The data collection step is crucial. The researcher must first
   decide whether an existing data source is adequate or whether new data must be
   collected. Even if a decision is made to use existing data, it is important to
   understand how the data were collected and for what purpose, so that any
   resulting limitations are also fully understood and judged to be acceptable. If new
   data are to be collected, a careful plan must be developed, because the type of
   analysis that is appropriate and the subsequent conclusions that can be drawn
   depend on how the data are collected.
   4. Data summarization and preliminary analysis. After the data are collected,
   the next step usually involves a preliminary analysis that includes summarizing
   the data graphically and numerically. This initial analysis provides insight into
   important characteristics of the data and can provide guidance in selecting
   appropriate           methods            for           further          analysis.
   5. Formal data analysis. The data analysis step requires the researcher to select
   and apply statistical methods.
   6. Interpretation of results. Several questions should be addressed in this final
      step. Some examples are:
       a. What can we learn from the data?
       b. What conclusions can be drawn from the analysis?
       c. How can our results guide future research?
   Illustration 1. The Admission Director of a university is interested in
   learning why some applicants who were accepted for the first semester of
   SY 2019-2020 failed to enrol at the university. The population of interest
   to the director consists of all accepted applicants who did not enrol in the
   first semester of SY 2019-2020. Because this population is large and it
   may be difficult to contact all the individuals, the director might decide to
   collect data from only 100 selected students.
                                                                         14 | P a g e
                              ENGINEERING DATA ANALYSIS
   From Illustration 1, deciding how to select the 100 students and what data should
   be collected from each student are steps 2 and 3 in the data analysis process.
   These 100 students constitute a sample.
     Definition 1.
     The entire collection of individuals or objects about which information is
     desired is called the population of interest. A sample is a subset of the
     population, selected for study.
        Methods for organizing and summarizing data, such as the use of tables,
graphs, or numerical summaries, make up the branch of statistics called descriptive
statistics.
       The second major branch of statistics, inferential statistics, involves
generalizing from a sample to the population from which it
was selected.
  Definition 2.
  Descriptive statistics is the branch of statistics that includes methods for
  organizing and summarizing data.
  Inferential statistics is the branch of statistics that involves generalizing from a
  sample to the population from which the sample was selected and assessing the
  reliability of such generalizations.
1.2 Types of Data
       The individuals or objects in any particular population typically possess many
characteristics that might be studied.
A variable is any characteristic whose value may change from one individual or
object to another.
IIlustration 2. Consider a group of students currently enrolled in a Calculus
course. One characteristic of the students in the population is the brand of
calculator owned (Casio, Sharp, Hewlett-Packard,      and so on). Another
characteristic is the number of textbooks purchased that semester, and yet
another is the distance from the university to each student’s permanent
residence.
For example, calculator brand is a variable, and so are number of textbooks
purchased and distance to the university.
                                                                            15 | P a g e
                              ENGINEERING DATA ANALYSIS
Data result from making observations either on a single variable or simultaneously on
two or more variables.
   Definition 3.
   A data set consisting of observations on a single characteristic is a univariate
   data set.
   A univariate data set is categorical (or qualitative) if the individual
   observations are categorical responses.
   A univariate data set is numerical (or quantitative) if each observation is a
   number.
Illustration 3. Illustration 2, calculator brand is a categorical variable, because
each student’s response to the query, “What brand of calculator do you own?” is
a category. The collection of responses from all these students forms a
categorical data set.
The other two variables, number of textbooks purchased and distance to the
university, are both numerical in nature. Determining the value of such a numerical
variable (by counting or measuring) for each student results in a numerical data set.
Bivariate data result from obtaining a category or value of pairs of numbers on two
different characteristics.
Multivariate data result from obtaining a category or value for each of two or more
attributes (so bivariate data are a special case of multivariate data).
Illustration 4.      Both height (in inches) and weight (in pounds) might be
recorded for each student in a class. The resulting is called a bivariate data
set. If the researcher is interested in determining height, weight, age, and
systolic blood pressure for each student in the class, the resulting data set is
called a multivariate data set.
1.2.1 Types of Numerical Data
There are two different types of numerical data: discrete and continuous.
Illustration 5. Suppose the following are the data available :
                   a.   Time student finish answering each item in a test.
                   b.   Weight of the children age 1 to 5 years old.
                   c.   Number of days absent in the class in a semester.
                   d.   Number of defectives products.
                                                                             16 | P a g e
                             ENGINEERING DATA ANALYSIS
Discrete data usually arise when observations are determined by counting. So that
data a and b are discrete data. The rest (c and d) are continuous data.
 Definition 4.
 A numerical variable results in discrete data if the possible values of the
 variable correspond to isolated points on the number line.
 A numerical variable results in continuous data if the set of possible values
 forms an entire interval on the number line
1.3    Observational and Experimentation
Data collection is a vital step in the data analysis process. It is important to keep in
mind the questions hope to answer on the basis of the resulting data. Sometimes the
researcher is interested in answering questions about characteristics of a single
existing population or in comparing two or more well-defined populations. To
accomplish this, sample is selected from each population under consideration and use
the sample information to gain insight into characteristics of those populations.
Illustration 6.   A safety engineer is studying industry workers to determine
whether gender and attitude toward safety are related.
This is study observational in nature. The researcher wants to observe
characteristics of workers in an industry, and then use the resulting information
to draw conclusions.
        Sometimes the questions you are trying to answer deal with the effect of
certain explanatory variables on some response and cannot be answered using data
from an observational study.
Such questions are often of the form:
     What happens when ... ? or,
     What is the effect of ... ?
Illustration 6.
 A professor may wonder what would happen to test scores if the required
laboratory time for a chemistry course were increased from 3 hours to 6 hours
per week. To answer such questions, the researcher conducts an experiment to
collect relevant data. The value of some response variable (test score in the
chemistry ) is recorded under different experimental conditions (3-hour lab and
6-hour lab).
        In an experiment, the researcher manipulates one or more explanatory
variables, also sometimes called factors, to create the experimental conditions.
                                                                           17 | P a g e
                             ENGINEERING DATA ANALYSIS
 Definition 5.
 A study is an observational study if the investigator observes characteristics of
 a sample selected from one or more existing populations.
 A study is an experiment if the investigator observes how a response variable
 behaves when one or more explanatory variables, also called factors, are
 manipulated.
The goal of an observational study is usually to draw conclusions about the
corresponding population or about differences between two or more populations.
The goal of an experiment is to determine the effect of the manipulated explanatory
variables (factors) on the response variable.
A well-designed experiment can result in data that provide evidence for a cause-and-
effect relationship. This is an important difference between an observational study
and an experiment. In an observational study, it is impossible to draw clear cause-and-
effect conclusions because we cannot rule out the possibility that the observed effect
is due to some variable other than the explanatory variable being studied. Such
variables are called confounding variables.
   Definition 6.
   A confounding variable is one that is related to both group membership and
   the response variable of interest in a research study.
1.4 Sampling
       Many studies are conducted in order to generalize from a sample to the
corresponding population. As a result, it is important that the sample be
representative of the population. To be reasonably sure of this, we must carefully
consider the way in which the sample is selected. Even when the sample is selected
properly, there may be uncertainty about whether the survey represents the population
from which the sample was selected.
   Bias in sampling is the tendency for samples to differ from the corresponding
population in some systematic way. The most common types of bias encountered in
sampling situations are selection bias, measurement or response bias, and
nonresponse bias.
    o Selection Bias . Tendency for samples to differ from the corresponding
      population as a result of systematic exclusion of some part of the population.
    o Measurement or Response Bias. Tendency for samples to differ from the
      corresponding population because the method of observation tends to produce
      values that differ from the true value. This problem often is due to the specific
      wording of questions in a survey, the manner in which the respondent answers
      the survey questions, and the fashion in which an interviewer phrases
      questions during the interview.
                                                                          18 | P a g e
                             ENGINEERING DATA ANALYSIS
    o Survey Nonresponse Bias. Tendency for samples to differ from the
      corresponding population because data are not obtained from all individuals
      selected for inclusion in the sample.
   Note: Bias is introduced by the way in which a sample is selected or by the way
   in which the data are collected from the sample. Increasing the size of the sample,
   although possibly desirable for other reasons, does nothing to reduce bias if the
   method of selecting the sample is flawed or if the nonresponse rate remains high.
   1.3.1 Sampling Methods
            o Random Sampling. A simple random sample of size n is a sample
that is selected from a population in a way that ensures that every different possible
sample of the desired size has the same chance of being selected. When selecting a
random sample, researchers can choose to do the sampling with or without
replacement.
                    Sampling with replacement. After each successive item is
 selected for the sample, the item is ―replaced‖ back into the population and may
therefore be selected again at a later stage. In practice, sampling with replacement is
rarely used.
                   Sampling without replacement. After being included in the
sample, an individual or object would not be considered for further selection.
          o Stratified Random Sampling. Sampling method wherein the entire
population can be divided into a set of non-overlapping subgroups. In stratified
random sampling, separate simple random samples are independently selected from
each subgroup.
           o Cluster Sampling. This involves dividing the population of interest
into non-overlapping subgroups, called clusters.
Note: Be careful not to confuse clustering and stratification. Even though both of
these sampling strategies involve dividing the population into subgroups, both the
way in which the subgroups are sampled and the optimal strategy for creating the
subgroups are different. In stratified sampling, we sample from every stratum,
whereas in cluster sampling, we include only selected whole clusters in the sample.
Because of this difference, to increase the chance of obtaining a sample that is
representative of the population, we want to create homogeneous groups for strata
and heterogeneous (reflecting the variability in the population) groups for clusters.
            o Systematic Sampling. A procedure that can be used when
it is possible to view the population of interest as consisting of a list or some other
sequential arrangement.
           o Convenience Sampling. Using an easily available or convenient group
                                                                          19 | P a g e
                             ENGINEERING DATA ANALYSIS
to form a sample. Results from such samples are rarely informative, and it is a
mistake to try to generalize from a convenience sample to any larger population.
         Application: Exercise #1
    1. As part of a curriculum review, a certain engineering department would like
       to select a simple random sample of 20 of last year’s 140 graduates to obtain
       information on how graduates perceived the value of the curriculum.
       Describe two different methods that might be used to select the sample.
    2. Based on a study of 1570 students in a university between the ages of 18 and
       21, researchers of a Medical School concluded that there was an association
       of academic performance in English and Mathematics courses. Describe the
       sample and the population of interest for this study.
    3. For each of the situations described, state whether the sampling procedure is
       simple random sampling, stratified random sampling, cluster sampling,
       systematic sampling, or convenience sampling.
.      a. All fourth-year students at a university are enrolled in 1 of 12 sections of a
       research course. To select a sample of fourth year at this university, a
       researcher selects four sections of the research course at random from the 12
       sections and all students in the four selected sections are included in the
       sample.
       b. To obtain a sample of students, faculty, and staff at a university, a
       researcher randomly selects 50 faculty members from a list of faculty, 100
       students from a list of students, and 30 staff members from a list of staff.
       c. A university researcher obtains a sample of students at his university by
       using the 85 students enrolled in his Math 111 class.
       d. To obtain a sample of the seniors at a particular high school, a researcher
       writes the name of each senior on a slip of paper, places the slips in a box and
       mixes them, and then selects 10 slips. The students whose names are on the
       selected slips of paper are included in the sample.
       e. To obtain a sample of those attending a basketball game, a researcher
       selects the 24th person through the door. Then, every 50th person after that is
       also included in the sample.
        Closure
           Well done! You have just finished Lesson 1 of this module. Should there
be some parts of the lesson which you need clarification, please ask your tutor during
your face-to-face or on-line interactions.
                                                                           20 | P a g e
                            ENGINEERING DATA ANALYSIS
        Now if you are ready, please proceed to Lesson 2 of this module which will
discuss the most widely used data collection procedure, the survey. Information from
surveys impact nearly every facet of our daily lives. Planning and conducting surveys
will be the focus of our next lesson.
                                                                        21 | P a g e
                             ENGINEERING DATA ANALYSIS
Learning Outcome:
           o Distinguish survey data collection techniques.
           o Identify problems associated with surveys.
Time Frame: 2 hours
Introduction:
        Many observational studies attempt to measure personal opinion or attitudes
using responses to a survey. In such studies, both the sampling method and the design
of the survey itself are critical to obtaining reliable information.
        This lesson presents the sampling designs for survey and the issues
associated with survey as a method of obtaining data.
        Abstraction:
2.1 Survey Basics
         Designing an observational study to compare two populations on the basis of
some easily measured characteristic is relatively straightforward, with attention
focusing on choosing a reasonable method of sample selection. However, many
observational studies attempt to measure personal opinion or attitudes using responses
to a survey. In such studies, both the sampling method and the design of the survey
itself are critical to obtaining reliable information.
Definition 1. A survey is a voluntary encounter between strangers in which an
interviewer seeks information from a respondent by engaging in a special type of
conversation. This conversation might take place in person, over the telephone, or
even in the form of a written questionnaire, and it is quite different from usual social
conversations.
Roles and Responsibilities of Interviewer and Respondents
        The interviewer gets to decide what is relevant to the conversation and may
         ask questions— possibly personal or even embarrassing questions. The
         respondent, in turn, may refuse to participate in the conversation and may
         refuse to answer any particular question. But having agreed to participate in
         the survey, the respondent is responsible for answering the questions
         truthfully.
The Respondent’s Tasks
                                                                           22 | P a g e
                              ENGINEERING DATA ANALYSIS
        Task 1: Comprehension. Comprehension is the single most important task
facing the respondent, and fortunately it is the characteristic of a survey question that
is most easily controlled by the question writer. Understandable directions and
questions are characterized by (1) a vocabulary appropriate to the population of
interest, (2) simple sentence structure, and (3) little or no ambiguity.
               Vocabulary is often a problem. As a rule, it is best to use the simplest
                possible word that can be used without sacrificing clear meaning.
               Simple sentence structure also makes it easier for the respondent to
                understand the question.
               Ambiguity can also arise from the placement of questions as well as
                from their phrasing. One way to find out whether or not a question is
                ambiguous is to field-test the question and to ask the respondents if
                they were unsure how to answer a question.
        Task 2: Retrieval from Memory. Retrieving relevant information from
memory to answer the question is not always an easy task, and it is not a problem
limited to questions of fact.
        Task 3: Reporting the Response The task of formulating and reporting a
response can be influenced by the social aspects of the survey conversation. In
general, if a respondent agrees to take a survey, he or she will be motivated to answer
truthfully. Therefore, if the questions are not too difficult (taxing the respondent’s
knowledge or memory) and if there are not too many questions (taxing the
respondent’s patience), the answers to questions will be reasonably accurate.
Three things to consider in constructing surveys and writing survey questions:
1. Questions should be understandable by the individuals in the population being
surveyed. Vocabulary should be at an appropriate level, and sentence structure should
be simple.
2. Questions should, as much as possible, recognize that human memory is fickle.
Questions that are specific will aid the respondent by providing better memory cues.
3.Questions should not create opportunities for the respondent to feel threatened or
embarrassed.
       In a perfect survey, the target population would be the same as the sampled
population. This type of survey rarely happens. There are always difficulties in
obtaining a sampling frame or being able to identify all elements within the target
population.
2.2 Data Collection Techniques for Survey
      Having chosen a particular sample survey, how does one actually collect the
data?
The most commonly used methods of data collection in sample surveys are:
                                                                            23 | P a g e
                          ENGINEERING DATA ANALYSIS
1. Interviews
   a. Personal Interview. The procedure usually requires the interviewer
      to ask prepared questions and to record the respondent’s answers. The
      primary advantage of these interviews is that people will usually respond
      when confronted in person. In addition, the interviewer can note specific
      reactions and eliminate misunderstandings about the questions asked.
   b. Telephone interview. Surveys conducted through telephone interviews are
      frequently less expensive than personal interviews, owing to the
      elimination of travel expenses. The investigator can also monitor the
      interviews to be certain that the specified interview procedure is being
      followed.
2. Self-administered questionnaire. These questionnaires usually are mailed to
   the individuals included in the sample, although other distribution methods
   can be used. The questionnaire must be carefully constructed if it is to
   encourage participation by the respondents. It must undergo validity and
   reliability testing.
3. Direct observation. Direct observation is used in many surveys that do not
   involve measurements on people.
     Application: Exercise #2
1. An experimenter wants to estimate the average water consumption per family
   in a city. Discuss the relative merits of choosing individual families, dwelling
   units (single-family houses, apartment buildings, etc.), and city subdivisions as
   sampling units.
2. As part of a curriculum review, the civil engineering department would like to
   select a simple random sample of 20 of last year’s 140 graduates to obtain
   information on how graduates perceived the value of the curriculum. Describe
   two different methods that might be used to select the sample.
3. For the given situation, decide what sampling method you would use. Provide
   an explanation of why you selected a particular method of sampling.
   The major state university in Region A is attempting to lobby the state
   legislator for a bill that would allow the university to charge a higher tuition
   rate than the other universities in the country. To provide a justification, the
   university plans to conduct a mail survey of its alumni to collect information
   concerning their current employment status. The university grants a wide
   variety of different degrees and wants to make sure that information is
   obtained about graduates from each of the degree types. A 5% sample of
   alumni is considered sufficient.
                                                                       24 | P a g e
                            ENGINEERING DATA ANALYSIS
       Closure
       Congratulations! You have successfully completed the tasks and activities for
Lesson 2. It is expected that you have gained insights about planning and conducting
survey as data collection method.
       Now if you are ready, please proceed to Lesson 3 of this module which will
discuss planning and conducting an experiment.
                                                                        25 | P a g e
                                 ENGINEERING DATA ANALYSIS
Learning Outcome:
       o Understand the concept of experimental design.
       o Distinguish the methods of experimental design.
Time Frame: 2 hours
Introduction:
        Sometimes the questions you are trying to answer deal with the effect of
certain explanatory variables on some response. Such questions are often of the form,
―What happens when . . . ?‖ or ―What is the effect of . . . ?‖ Experiments provide a way
to collect data to answer these types of questions.
       This lesson present the key concept of experiment design, and the methods of
experimental design.
          Activity
        Suppose in an experiment, the researchers decide to use two room
temperature settings, 18°C and 24°C. Further suppose that there are 10 sections of
first-semester Calculus1 that have agreed to participate in the study. The experiment
is designed in this way:
        Set the room temperature to 18°C in five of the rooms and to 24°C in the other
five rooms on test day, and then compare the exam scores for the 18°C group and the
24°C group. Suppose that the average exam score for the students in the 18°C group
was noticeably higher than the average for the 24°C group.
          Analysis
        Based on the information given in the activity, could you conclude that the
increased temperature resulted in a lower average score? Yes or No.
If no, are their any factors that affects or are related to the exam scores? Can you
enumerate them?
         Abstraction:
    3.1 . Concepts of Experimental Design
   For example, an engineer may be considering two different workstation designs
and might want to know whether the choice of design affects work performance.
Experiments provide a way to collect data to answer these types of questions.
                                                                           26 | P a g e
                              ENGINEERING DATA ANALYSIS
Before we describe the concepts of experimental design, the following terms are
defined:
 Definition 1.
 An experiment is a study in which one or more explanatory variables are
 manipulated in order to observe the effect on a response variable.
 An experimental condition is any particular combination of values for the
 explanatory variables. Experimental conditions are also called treatments.
 An experimental unit is the smallest unit to which a treatment is applied.
 The explanatory variables are those variables that have values that are controlled
 by the experimenter. Also called independent variable or factors.
 The response variable is a variable that is not controlled by the experimenter and
 that is measured as part of the experiment. Also called dependent variable.
       In the language of experimental design, treatments are assigned at random to
experimental units, and replication means that each treatment is applied to more than
one experimental unit.
Illustration 1. Suppose we are interested in determining the effect of room
temperature on performance on a first-year Calculus 1 exam. In this case, the
explanatory variable is room temperature (it can be manipulated by the
experimenter). The response variable is exam performance (the variable that
is not controlled by the experimenter and that will be measured).
 In general, we can identify the explanatory variables and the response variable easily
if we can describe the purpose of the experiment in the following terms:
The purpose is to assess the effect of   ⏟               on ⏟                    .
 A well-designed experiment requires more than just manipulating the explanatory
 variables; the design must also eliminate other possible explanations or the
 experimental results will not be conclusive(Peck,R, Olsen, C. and Devore, J.,
 2012).
       In designing an experiment our goal is to determine the effects of the
explanatory variables on the chosen response variable. To do this, we must take into
consideration any extraneous variables that, although not of interest in the current
study, might also affect the response variable.
                                                                           27 | P a g e
                             ENGINEERING DATA ANALYSIS
 Definition 3.
 An extraneous variable is one that is not one of the explanatory variables in the
 study but is thought to affect the response variable.
        A well-designed experiment copes with the potential effects of extraneous
variables by using random assignment to experimental conditions and sometimes also
by incorporating direct control and/or blocking into the design of the experiment.
Illustration 2. In illustration 1, the calculus test example, the textbook used is
an extraneous variable because part of the differences in test results might be
attributed to this variable. We could control this variable directly, by requiring
that all sections use the same textbook. Then any observed differences
between temperature groups could not be explained by the use of different
textbooks. The extraneous variable time of day might also be directly
controlled in this way by having all sections meet at the same time.
        The effects of some extraneous variables can be filtered out by a process
known as blocking. Extraneous variables that are addressed through blocking are
called blocking variables. Blocking creates groups (called blocks) that are similar
with respect to blocking variables; then all treatments are tried in each block.
Illustration 3. In illustration 1, we might use instructor as a blocking variable.
If five instructors are each teaching two sections of calculus, we would make
sure that for each instructor, one section was part of the
20° group and the other section was part of the 27° group. With this design, if
we see a difference in exam scores for the two temperature groups, the
extraneous variable instructor can be ruled out as a possible explanation,
because all five instructors’ students were present in each temperature group.
(Had we controlled the instructor variable by choosing to have only one
instructor, that would be an example of direct control.
        If one instructor taught all the 20° sections and another taught all the 27°
sections, we would be unable to distinguish the effect of temperature from the effect
of the instructor. In this situation, the two variables (temperature and instructor) are
said to be confounded.
 Definition 4.
 Two variables are confounded if their effects on the response variable cannot be
 distinguished from one another.
        In Illustration 1, Calculus test, on the factors related to exam scores is the
student ability, which cannot be controlled by the experimenter and which would be
difficult to use as blocking variables. These extraneous variables are handled by the
use of random assignment to experimental groups.
                                                                           28 | P a g e
                                     ENGINEERING DATA ANALYSIS
       Random assignment can be effective only if the number of subjects or
observations in each experimental condition (treatment) is large enough for each
experimental group to reliably reflect variability in the population.
Replication is the design strategy of making multiple observations for each
experimental condition. Together, replication and random assignment allow the
researcher to be reasonably confident of comparable experimental groups.
  Definition 5.
  Random Assignment. Random assignment (of subjects to treatments or of
  treatments to trials) to ensure that the experiment does not systematically
  favor one experimental condition (treatment) over another.
  Blocking. Using extraneous variables to create groups (blocks) that are
  similar. All experimental conditions (treatments) are then tried in each block.
  Direct Control. Holding extraneous variables constant so that their effects
  are not confounded with those of the experimental conditions (treatments).
  Replication. Ensuring that there is an adequate number of observations for
  each experimental condition.
Experimental designs in which experimental units are assigned at random to
treatments or in which treatments are assigned at random to trials are called
completely randomized designs. When blocking is used, the design is called a
randomized block design.
       Figure 1 shows a diagram highlighting important features of some common
experimental designs. The structure of an experiment that is based on random
assignment of experimental units to one of two treatments. The diagram can be easily
adapted for an experiment with more than two treatments.
    Figure 1. Diagram of an experiment with random assignment of
    experimental units to two treatments
    Source: Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
    Analysis.
                                                                                               29 | P a g e
                             ENGINEERING DATA ANALYSIS
2.2 Use of Control Group
Many experiments compare a group that receives a particular treatment to a control
group that receives no treatment. The use of a control group allows the experimenter
to assess how the response variable behaves when the treatment is not used. This
provides a baseline against which the treatment groups can be compared to determine
whether the treatment had an effect.
Illustration 4. Suppose that a mechanical engineer wants to know whether a
gasoline additive increases fuel efficiency (kilometres per liter). Such an
experiment might use a single car (to eliminate car-to-car variability) and a
sequence of trials in which 1 liter of gas is put in an empty tank, the car is driven
around a racetrack at a constant speed, and the distance travelled on the liter
of gas is recorded. To determine whether the additive increases gas mileage, it
would be necessary to include a control group of trials in which distance
travelled was measured when gasoline without the additive was used. The trials
would be assigned at random to one of the two experimental conditions (additive
or no additive)
        Even though this experiment consists of a sequence of trials all with the same
car, random assignment of trials to experimental conditions is still important because
there will always be uncontrolled variability. For example, temperature or other
environmental conditions might change over the sequence of trials, the physical
condition of the car might change slightly from one trial to another, and so on.
        Random assignment of experimental conditions to trials will tend to even out
the effects of these uncontrollable factors
2.3 The Use of Placebo
      In experiments that use human subjects, use of a control group may not be
enough to determine whether a treatment really does have an effect. People
sometimes respond merely to the power of suggestion.
Illustration 5. Suppose a study is conducted to determine whether a particular
herbal supplement is effective in promoting weight loss. An experimental group
was identified to be the group that takes the herbal supplement and a control
group that takes nothing. It is possible that those who take the herbal
supplement and believe that they are taking something that will help them to
lose weight may be more motivated and may unconsciously change their eating
behaviour or activity level, resulting in weight loss.
        If an experiment is to enable researchers to determine whether a treatment
really has an effect, comparing a treatment group to a control group may not be
enough. To address the problem, many experiments use what is called a placebo.
                                                                         30 | P a g e
                              ENGINEERING DATA ANALYSIS
  Definition
  A placebo is something that is identical (in appearance, taste, feel, etc.) to the
  treatment received by the treatment group, except that it contains no active
  ingredients.
        As long as the subjects did not know whether they were taking the placebo,
the placebo group would provide a better basis for comparison and would allow the
researchers to determine whether the treatment had any real effect over and above the
―placebo effect.
         Application: Exercise #3
1. The head of the quality control department at a printing company would like to
   carry out an experiment to determine which of three different glues results in the
   greatest binding strength. Although they are not of interest in the current
   investigation, other factors thought to affect binding strength are the number of
   pages in the book and whether the book is being bound as a paperback or a
   hardback.
   a. What is the response variable in this experiment?
   b. What explanatory variable will determine the experimental conditions?
   c. What two extraneous variables are mentioned in the problem description? Are
   there other extraneous variables that should be considered?
 2. A study of college students showed a temporary gain of up to 9 IQ points after
    listening to a Mozart’s music. This conclusion, dubbed the Mozart effect, has
    since been criticized by a number of researchers who have been unable to
    confirm the result in similar studies. Suppose that you wanted to see whether
    there is a Mozart effect for students at your school.
    a. Describe how you might design an experiment for this purpose.
    b. Does your experimental design include direct control of any extraneous
    variables? Explain.
    c. Does your experimental design use blocking? Explain why you did or did not
    include blocking in your design.
    d. What role does random assignment play in your design?
       Closure
        Congratulations! You have successfully completed the tasks and activities for
Lesson 3. It is expected that you are knowledgeable about obtaining data, through
survey and experiments.
        You are almost done with this module. The module summary and assessment
will follow.
                                                                              31 | P a g e
                    ENGINEERING DATA ANALYSIS
                              SUMMARY
o Data collection and analysis process:
      Understanding the nature of the problem.
      Deciding what to measure and how to measure it.
      Data Collection
      Data summarization and preliminary analysis
      Formal data analysis
      Interpretation of results
o The entire collection of individuals or objects about which
  information is desired is called the population of interest.
o A sample is a subset of the population, selected for study.
o A data set consisting of observations on a single characteristic is a
   univariate data set.
o A univariate data set is categorical (or qualitative) if the
  individual observations are categorical responses.
o A univariate data set is numerical (or quantitative) if each
  observation is a number.
o Data collection techniques for survey
            Interviews
            Self-administered questionnaire
            Direct Observation
 o An experiment is a study in which one or more explanatory
   variables are manipulated in order to observe the effect on a
   response variable.
 o An experimental condition is any particular combination of
   values for the explanatory variables. Experimental conditions are
   also called treatments.
 o An experimental unit is the smallest unit to which a treatment is
   applied.
  o The explanatory variables are those variables that have values
    that are controlled by the experimenter. Also called independent
    variable or factors
 o The response variable is a variable that is not controlled by the
   experimenter and that is measured as part of the experiment. Also
   called dependent variable.
                                                                  32 | P a g e
                            ENGINEERING DATA ANALYSIS
                                  ASSESSMENT
1.   Two surveys were conducted to measure the effectiveness of an advertising
     campaign for a low-fat brand of ice cream. In one of the surveys, the
     interviewers visited the home and asked whether the low-fat brand ice cream
     was purchased. In the other survey, the interviewers asked the person to show
     them the ice cream container when the interviewee stated he or she had
     purchased low-fat ice cream.
     a. Do you think the two types of surveys will yield similar results on the
     percentage of households using the product?
     b. What types of biases may be introduced into each of the surveys?
2.   The ―A‖ City school district is planning a survey of 300 of its 15, 000 parents
     or guardians who have students currently enrolled. They want to assess the
     parents’ opinion about mandatory drug testing of all students participating in
     any extracurricular activities, not just . An alphabetical listing of all parents or
     guardians is available for selecting the sample. In each of the following
     descriptions of the method of selecting the 300 participants in the survey,
     identify the type of sampling method used (simple random sampling, stratified
     sampling, or cluster sampling).
     a. Each name is randomly assigned a number. The names with numbers 1
        through 300 are selected for the survey.
     b.      The schools are divided into five groups according to grade level
     taught at the school:
      Grade 6 –7, 8–9, 10 –12. Three separate sampling frames are constructed,
     one for each group. A simple random sample of 100 parents or guardians is
     selected from each group.
     c. The school district is also concerned that the parent or guardian’s opinion
        may differ depending on the age and sex of the student. Each name is
        randomly assigned a number. The names with numbers 1 through 300 are
        selected for the survey. The parent is asked to fill out a separate survey for
        each of their currently enrolled children.
3.   The major private university in the region is attempting to lobby to the
     Commission on Higher Education that l would allow the university to charge a
     higher tuition rate than the other universities in the country. To provide a
     justification, the university plans to conduct a mail survey of its alumni to
     collect information concerning their current employment status. The university
     grants a wide variety of different degrees and wants to make sure that
     information is obtained about graduates from each of the degree types. A 5%
     sample of alumni is considered sufficient.
      Decide what sampling method you would use. Provide an explanation of why
     you selected a particular method of sampling.
                                                                           33 | P a g e
                          ENGINEERING DATA ANALYSIS
4.   An experiment is planned to compare three types of schools—public, private-
     nonparochial, and parochial—all with respect to the problem solving in
     mathematics abilities of freshmen engineering. The researcher selects two
     large cities in each of six provinces of Davao region for the study. In each
     province, the researcher randomly selects one school of each of the three types
     and randomly selects a single freshmen class within each school. The scores
     on a standardized test are recorded for each of 20 students in each classroom.
     The researcher is concerned about differences in family income levels among
     the 30 schools, so she obtains the family income for each of the students who
     participated in the study.
     a. Identify the important features of the design.
     b. Identify each of the following components of the experimental design.
         i. factors
         ii. factor levels
         iii. blocks
         iv. experimental unit
         v. measurement unit
         vi. replications
         vii. treatments
                                                                       34 | P a g e
                              ENGINEERING DATA ANALYSIS
                                      References
Broto, A.S. (2007). Simplified Approach to Inferential Statistics(1st ed.). National .
               Philippines.
Carambas, Zenaida U(2011). Basic probability and Statistics. Valencia Educational
             Supply. Baguio City
Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
               Analysis(4th edition). Brooks/Cole/Cengage Learning, 20 Channel
               Center Street Boston, MA 02210, USA
Ott, R.L., Longnecker, M. (2010). An Introduction to Statistical Methods and Data
               Amalysis(6th ed). Brooks/Cole, Cengage Learning, CA, USA.
Raussas, George(2003). Introduction to Probability and Statistical Inference.
              Elseviere Science, USA
Walpole, RE, & Myers, RH.(1993). Probability and Statistics for Engineers and (5th
             ed.). Macmillan Publishing Company, New York.
                                                                             35 | P a g e