Factorial Design Notes and Examples
Factorial Design Notes and Examples
• Many applications of response surface methodology are based on fitting one of the following
models:
• Common applications of 2k factorial designs (and the fractional factorial designs in Section 5
of the course notes) include the following:
• We will first analyze each 2k design as a fixed effects design. We will also generalize the
fixed effects results to the regression model approach for which the model contains regression
coefficients β0 , β1 , β2 , . . . as in (3) and (4).
• Before analyzing the data, you must determine if the design was completely randomized or
if blocking was used. Your answer to this question will indicate the appropriate analysis.
Initially, we will assume the design was completely randomized.
• Because a 22 design has only 4 runs, several (n) replications are taken.
• Notationally, we use lowercase letters a, b, ab, and (1) to indicate the sum of the responses
for all replications at each of the corresponding levels of A and B.
– If the lower case letter appears, then that factor is at its high (+1) level.
– If the lower case letter does not appear, then that factor is at its low (−1) level.
38
Factor Level Coded Replicate Sum of n
Combination Levels 1 2 ··· n Replicates
A low , B low −1 −1 xxx xxx ··· xxx (1) = y11·
A high, B low +1 −1 xxx xxx ··· xxx a = y21·
A low , B high −1 +1 xxx xxx ··· xxx b = y12·
A high, B high +1 +1 xxx xxx ··· xxx ab = y22·
• We will use the notation A+ and A− to represent the set of observations with factor A at its
high (+1) and its low (−1) levels, respectively. The same notation applies to B + and B − for
factor B.
• y A+ and y A− are the means of all observations when A = +1 and A = −1, respectively.
• y B + and y B − are the means of all observations when B = +1 and B = −1, respectively.
• The average effect of a factor is the average change in the response produced by a change
in the level of that factor averaged over the levels of the other factor.
1
A = y A+ − y A− = = [ab + a − b − (1)] .
2n
1
B = yB+ − yB− = = [ab − a + b − (1)] .
2n
— Interaction effect between Factors A and B, denoted AB, is the difference between (i)
the average change in response when the levels of Factor A are changed given Factor B is at
its high level and (ii) the average change in response when the levels of Factor A are changed
given Factor B is at its low level:
AB = (y A+ B + − y A− B + ) − (y A+ B − − y A− B − )
ab − a − b + (1)
= =
2n
Note: The results would be the same if we switched the roles of A and B in the definition:
AB = (y A+ B + − y A+ B − ) − (y A− B + − y A− B − )
ab − a − b + (1)
= =
2n
• Note that when estimating the effects for A, B and AB the following contrasts are used:
39
• ΓA , ΓB , and ΓAB are used to estimate A, B, and AB, and they are orthogonal contrasts.
– The coefficient vectors for the contrasts are [1 1 − 1 − 1] for A, [1 − 1 1 − 1] for B, and
[1 − 1 − 1 1] for AB. Note the dot product of any two vectors = 0. This is why they
are called orthogonal contrasts.
• Because there are two levels for both factors, the degrees of freedom associated with each sum
of squares is 1. Thus, M SA = SSA , M SB = SSB , and M SAB = SSAB .
• Because there are n replicates for each of the four A ∗ B treatment combinations, there are
4(n − 1) degrees of freedom for error for the four-parameter interaction model in (4).
• It is common to list the treatment combinations in standard order: (1), a, b, and ab. Many
references use a shortened notation (− or +) to denote the low (−1) and high (+1) levels of
a factor.
Example: An engineer designs a 22 design with n = 4 replicates to study the effects of bit size (A)
and cutting speed (B) on routing notches in a printed circuit board.
A B AB Replicates Totals
− − + 18.2 18.9 12.9 14.4 (1) = 64.4
+ − − 27.2 24.0 22.4 22.5 a = 96.1
− + − 15.9 14.5 15.1 14.2 b = 59.7
+ + + 41.0 43.9 36.3 39.9 ab = 161.1
Note: the signs in the AB column are the signs that result when multiplying the A and B columns.
• The estimates of the fixed effects are:
ΓA ab + a − b − (1) 161.1 + 96.1 − 59.7 − 64.4
A = = = =
2n 2n 8
ΓB ab − a + b − (1) 161.1 − 96.1 + 59.7 − 64.4
B = = = =
2n 2n 8
ΓAB ab − a − b + (1) 161.1 − 96.1 − 59.7 + 64.4
AB = = = =
2n 2n 8
133.12 60.32
SSA = = 1107.2256 SSB = = 227.2556
16 16
2 X
2 X
4
69.72 X
n
2
y··· 381.32
SSAB = = 303.6306 SST = yijk − = 10796.7− = 1709.8344
16 i=1 j=1 k=1
4n 16
SSE = SST − SSA − SSB − SSAB = 71.7225
• Sums of squares can also be calculated using the formulas for a two-factor factorial design.
40
The Regression Model
• If both factors in the 22 design are quantitative (say, x1 and x2 ), we can fit the first order
regression model
y = β0 + β1 x1 + β2 x2 + .
or, we can fit the regression model with interaction:
y = β0 + β1 x1 + β2 x2 + β12 x1 x2 + .
• The least squares estimates [ b0 b1 b2 b12 ]0 = (X0 X)−1 X 0 y are directly related to the estimated
effects A, B, and AB from the fixed effects analysis:
ab + a + b + (1)
b0 = or b0 = y
4n
ΓA ab + a − b − (1)
b1 = = or b1 = A/2
4n 4n
ΓB ab + b − a − (1)
b2 = = or b2 = B/2
4n 4n
ΓAB ab + (1) − a − b
b12 = = or b2 = AB/2
4n 4n
• For the previous example:
b0 = y = 381.3/16 = 23.83125
b1 = A/2 = 16.6375/2 = 8.31875
b2 = B/2 = 7.5375/2 = 3.76875
b12 = AB/2 = 8.7125/2 = 4.35625
• Therefore, the fitted regression equation is
yb = 23.83125 + 8.31875x1 + 3.76875x2 + 4.35625x1 x2
where (x1 , x2 ) are the coded levels of factors A and B.
41
• For a 23 design with n replicates, each estimated effect is the differences between two means:
The first mean is the average of all data corresponding to the + rows in an effect column and
the second mean is the average of all data corresponding to the − rows in an effect column.
(a + ab + ac + abc) (1) + b + c + bc
A = y A+ − y A− = −
4n 4n
1
= [a + ab + ac + abc − (1) − b − c − bc] .
4n
(b + ab + bc + abc) (1) + a + c + ac
B = yB+ − yB− = −
4n 4n
1
= [b + ab + bc + abc − (1) − a − c − ac] .
4n
(c + ac + bc + abc) (1) + a + b + ab
C = yC + − yC − = −
4n 4n
1
= [c + ac + bc + abc − (1) − a − b − ab] .
4n
• Let Γ = the contrast sum in the numerator for any of the effects. Then the sums of squares
associated with that effect is SS =
42
Geometric Representation for a 23 Design
C effect
43
Estimation of Two-Factor Interaction Effects
44
The Regression Model
• If all three factors in the 23 design are quantitative (say, x1 , x2 , and x3 ), we can fit the
regression model
• The least squares estimates (with the exception of b0 ) are 1/2 of the estimated effects from
the fixed effects analysis. That is,
• Because all of the contrasts associated with each of the effects are orthogonal, the least squares
estimates remain unchanged for any model containing a subset of terms in (6).
A B C Replicates Treatment
x1 x2 x3 Sums
− − − 22 31 25 (1) = 78
+ − − 32 43 29 a = 104
− + − 35 34 50 b = 119
+ + − 55 47 46 ab = 148
− − + 44 45 38 c = 127
+ − + 40 37 36 ac = 113
− + + 60 50 54 bc = 164
+ + + 39 41 47 abc = 127
Analyze the data (with lack-of-fit tests) assuming the following 4 models:
• (Model 4): A regression model with all two-factor crossproduct (interaction). terms.
45
• We will first estimate effects and sums of squares using the formulas, then use SAS to perform
the analysis. Recall:
(1) a b ab c ac bc abc
78 104 119 148 127 113 164 127
Model
Fixed Effects −→ I A B C AB AC BC ABC Treatment
Regression −→ Int x1 x2 x3 x 1 x2 x1 x3 x 2 x3 x1 x2 x3 Sums
+ − − − + + + − (1) = 78
+ + − − − − + + a = 104
+ − + − − + − + b = 119
+ + + − + − − − ab = 148
+ − − + + − − + c = 127
+ + − + − + − − ac = 113
+ − + + − − + − bc = 164
+ + + + + + + + abc = 127
Γ2ef f ect
• The sums of squares are calculated using :
8n
42 (136)2 822
SSA = = .6 SSB = = 770.6 SSC = = 280.16
24 24 24
(−20)2 (−106)2
SSAB = = 16.6 SSAC = = 468.16
24 24
(−34)2 (−26)2
SSBC = = 48.16 SSABC = = 28.16
24 24
46
• Fixed effects additive model (Model 1):
• Note the effect estimates in the SAS output match the formula calculations.
Note that the parameter estimates are 1/2 of those from the fixed effects in Model 1.
• For Models 1 and 2, there are df for pure error and df for total error. Thus, the
df for lack-of-fit = . This means we can add at most additional terms in the
model (such as interaction terms).
• The residuals in the Residual vs Predicted Value plot (page 50) are not randomly scattered
about 0 for several (x1 , x2 , x3 ) combinations. This suggests a lack-of-fit problem.
MODEL 1: ADDITIVE FIXED EFFECTS MODEL MODEL 2: FIRST ORDER REGRESSION MODEL
Source DF Type III SS Mean Square F Value Pr > F Pure Error 16 482.66667 30.16667
Standard
Parameter Estimate Error t Value Pr > |t| Parameter Estimates
47
MODEL 1: ADDITIVE FIXED EFFECTS MODEL
The GLM Procedure
The GLM Procedure
Y
Y
Level of
Level
A of N Mean Std Dev
A N Mean Std Dev
-1 12 40.6666667 11.7808267
-1 12 40.6666667 11.7808267
MODEL 3: INTERACTION
1
FIXED 7.1858447
12 41.0000000
EFFECTS MODEL
1 12 41.0000000 7.1858447
The GLM Procedure
Y
Y
Level of
Level
B of N Mean Std Dev
B
A N Mean Std Dev
-1 12 35.1666667 7.46912838
-1 12 35.1666667
40.6666667 7.46912838
11.7808267
1 12 46.5000000 8.03967435
1 12 46.5000000
41.0000000 8.03967435
7.1858447
Y
Y
Level of Level of
A LevelBof N Mean Std Dev
C
B N Mean Std Dev
-1 -1 6 34.1666667 9.7039511
-1 12 37.4166667 7.46912838
35.1666667 10.5093753
-1 1 6 47.1666667 10.4769588
1 12 44.2500000 7.3870279
46.5000000 8.03967435
1 -1 6 36.1666667 5.1153364
1 1 6 45.8333333 5.6005952
Y
Level of Level of
A B N Mean
Y Std Dev
-1 Level-1of 6 34.1666667 9.7039511
C N Mean Std Dev
-1 1 6 47.1666667 10.4769588
-1 12 37.4166667 10.5093753
1 -1 6 36.1666667 5.1153364
1 12 44.2500000 7.3870279
1 1 6 45.8333333 5.6005952
Y
Y
Level of Level of
A LevelCof N Mean Std Dev
C N Mean Std Dev
-1 -1 6 32.8333333 9.82683401
-1 12 37.4166667 10.5093753
-1 1 6 48.5000000 7.84219357
1 12 44.2500000 7.3870279
1 -1 6 42.0000000 9.79795897
MODEL 3: INTERACTION FIXED EFFECTS MODEL
1 1 6 40.0000000 3.89871774
The GLM ProcedureY
Level of Level of
A C N Mean Y Std Dev
-1
Level of -1
Level of 6 32.8333333 9.82683401
B C N Mean Std Dev
-1 1 6 48.5000000 7.84219357
-1 -1 6 30.3333333 7.25718035
1 -1 6 42.0000000 9.79795897
-1 1 6 40.0000000 3.74165739
1 1 6 40.0000000 3.89871774
1 -1 6 44.5000000 8.36062199
1 1 6 48.5000000 7.91833316
48
• Now let’s add the three two-factor interactions to get Models 3 and 4.
yi = β0 + β1 x1i + β2 x2i + β3 x3i + + β12 x1i x2i + β13 x1i x3i + β23 x2i x3i + i
Note that the parameter estimates are 1/2 of those from the fixed effects in Model 3.
• The residuals are randomly scattered about 0. This suggests there is no lack-of-fit problem.
The lack-of-fit test (p-value= ) supports this.
Source DF Type III SS Mean Square F Value Pr > F Pure Error 16 482.66667 30.16667
A*B 1 16.6666667 16.6666667 0.55 0.4666 Root MSE 5.48170 R-Square 0.7562
C 1 280.1666667 280.1666667 9.32 0.0072 Dependent Mean 40.83333 Adj R-Sq 0.6702
A*C 1 468.1666667 468.1666667 15.58 0.0010 Coeff Var 13.42457
MODEL 3: INTERACTION FIXED EFFECTS MODEL
B*C 1 48.1666667 48.1666667 1.60 0.2226
The GLM Procedure
Parameter Estimates
nt Variable: Y Parameter Standard Variance
Variable DF Estimate Error t Value Pr > |t| Inflation
Standard
Parameter Estimate Error t Value Pr > |t| Intercept 1 40.83333 1.11895 36.49 <.0001 0
A*B -1.6666667 2.23789408 -0.74 0.4666 X1X2 1 -0.83333 1.11895 -0.74 0.4666 1.00000
A*C -8.8333333 2.23789408 -3.95 0.0010 X1X3 1 -4.41667 1.11895 -3.95 0.0010 1.00000
B*C -2.8333333 2.23789408 -1.27 0.2226 X2X3 1 -1.41667 1.11895 -1.27 0.2226 1.00000
49
MODEL 2: FIRST ORDER REGRESSION MODEL
1 1
5
RStudent
RStudent
Residual
0 0 0
-5 -1 -1
-10
-2 -2
60
10 0.15
50
5
Cook's D
Residual
0.10
0 40
Y
-5 0.05
30
-10
20 0.00
-2 -1 0 1 2 20 30 40 50 60 0 5 10 15 20 25
Quantile Predicted Value Observation
30 Fit–Mean REGRESSION
MODEL 4: INTERACTION Residual MODEL
25
10
20 The REG Procedure
5 Model: MODEL1 Observations 24
Percent
15 Parameters 4
Dependent
0 Variable: Y Error DF 20
10 MSE 52.192
-5 R-Square 0.5018
5
Fit
-10 Diagnostics for Y Adj R-Square 0.4271
0
10
-20 -10 0 10 20 2 2
0.0 0.4 0.8 0.0 0.4 0.8
Residual Proportion Less
5 1 1
RStudent
RStudent
Residual
0 0
0
-1 -1
-5
-2 -2
10 60 0.25
0.20
5 50
Cook's D
Residual
0.15
0 40
Y
0.10
-5 30
0.05
-10 20 0.00
-2 -1 0 1 2 20 30 40 50 60 0 5 10 15 20 25
Quantile Predicted Value Observation
Fit–Mean Residual
30
10
Observations 24
ercent
20
50 Parameters 7
0
SAS Code for the 23 Design Example
51
4.3 Analyzing Unreplicated Experiments
• To test hypotheses in an unreplicated 2k design (n = 1), it is necessary to “pool” interaction
terms (especially higher-order interaction terms), and use the MSE after pooling as an estimate
of the random error σ 2 .
• The problem is to determine which interaction terms should be pooled together. The following
three steps are recommended:
1. Estimate all effects for the full-factorial interaction model.
2. Make a normal probability plot of the estimated effects (excluding the intercept), and
label the “outlier” effects. Higher-order interactions which are not outliers can be pooled
to form the MSE.
3. Run the ANOVA using this pooled error term.
• Warning: When a higher-order interaction exists, it is inappropriate to pool that interaction
with the other interactions because it will inflate the MSE.
• Some comments on the normal probability plot of the 2k − 1 estimates for either the fixed
effects or regression model:
– If an effect is not significantly different than zero, then it should be randomly and nor-
mally distributed about 0. That is, it is N (0, σ 2 / . When plotted, all of the effects
which are not significantly different than zero should lie along a straight line on the
normal probability plot.
– If an effect is significantly different than zero, then it should be randomly and normally
distributed about its mean which we will call β. That is, the effect is N (β, σ 2 / ).
Then, in the normal probability plot, all of the non-zero effects will be plotted away from
the line formed by the zero-mean effects.
Analyze the data from this unreplicated experiment from Design and Analysis of Experiments, by
D. Montgomery (8th ed., p.298).
52
A 2**4 DESIGN -- ESTIMATION OF EFFECTS
Sum of
Source DF Squares Mean Square F Value Pr > F
Standard
Parameter Estimate Error t Value Pr > |t|
A TIME 4.50 . . .
B CONC 0.50 . . .
C PRESSURE 2.00 . . .
D TEMP 3.25 . . .
A*B TIME*CONC -0.75 . . .
A*C TIME*PRES -4.25 . . .
A*D TIME*TEMP 4.00 . . .
B*C CONC*PRES 0.25 . . .
B*D CONC*TEMP 0.00 . . .
C*D PRES*TEMP 0.00 . . .
A*B*C TIME*C*P 1.00 . . .
A*B*D TIME*C*T 0.75 . . .
A*C*D TIME*P*T -0.25 . . .
B*C*D C*P*TEMP -0.75 . . .
A*B*C*D T*C*P*T 1.00 . . .
^^^^^^^^^^^
Make a NPP of these estimates
53
DM ’LOG; CLEAR; OUT; CLEAR;’;
ODS LISTING;
* ODS PRINTER PDF file=’C:\COURSES\ST578\SAS\TWO4.PDF’;
OPTIONS PS=54 LS=78 NODATE NONUMBER;
DATA IN;
DO TEMP = -1 TO 1 BY 2;
DO PRESSURE = -1 TO 1 BY 2;
DO CONC = -1 TO 1 BY 2;
DO TIME = -1 TO 1 BY 2;
INPUT YIELD @@; OUTPUT;
END; END; END; END;
LINES;
12 18 13 16 17 15 20 15 10 25 13 24 19 21 17 23
;
**********************************************************;
*** PART I: DETERMINE THE ESTIMATES OF THE 15 EFFECTS ***;
**********************************************************;
54
**************************************************************************;
*** PART II: MAKE A NORMAL PROBABILITY PLOT OF THE ESTIMATED EFFECTS ***;
**************************************************************************;
2
EFFECTS
-2
-4
0 2 4 6 8
Count
2
EFFECTS
-2
-4
-2 -1 0 1 2
Normal Quantiles
55
Analysis I: Pooling high order interactions
• After pooling all 3-factor and 4-factor interaction, we have 5 df for the M SE .
• The ANOVA indicates significant A, C, AC, D, and AD effects. These match the highlighted
points on the normal probability plot of effects.
******************************************************************;
*** PART III: RUN ANOVA WITH POOLED HIGHER ORDER INTERACTIONS ***;
******************************************************************;
56
Analysis II: Pooling terms involving factor B = concentration (CONC)
• The ANOVA indicates significant A, C, AC, D, and AD effects. These match the highlighted
points on the normal probability plot of effects.
• After factor B is removed, we still retain balance and orthogonality. We now have a 23 design
with n = 2 replicates for each combination of factor levels for A, C, and D.
**************************************************************;
*** RUN ANOVA WITH CONCENTRATION REMOVED FROM THE ANALYSIS ***;
**************************************************************;
RUN;
57