Types of Main Effects in Factorial ANOVA Models
Questions about types of main effects (or types of sums of squares) come up more often than many of us would like. Why do tests of main effects sometimes differ between software packages? What do "Type II" and "Type III" mean? What are the various ways that a main effect can be defined in the context of an interaction? Unfortunately, the answers are not always straightforward. I have written this document to provide an explanation that students can refer to as needed, when my impromptu verbal asides leave something to be desired.
In the examples below, I use R code. For background about R, see my brief intro to R or the videos and/or documents I have shared from my longer R Workshop.
This document assumes that you are already comfortable with the basic principles of linear regression. Most notably, you should understand how to interpret regression slopes from numeric and binary variables, how to interpret such slopes when there are multiple predictors in the same model, and how to interpret the terms in a regression model with an interaction term. For a refresher on those concepts, see my regression videos, including the series about interactions (2023) and the tutorial on main effects vs. interaction (2020). I also have a video series about ANOVA (2023) that covers all of the content on this page in more detail.
What is ANOVA?
ANOVA is a very general term for a variety of models that partition the variance of one variable (the response variable or dependent variable) into portions associated with other variables, combinations of variables, and sources of noise or error. Here, I am focusing on between-subjects, fixed-effects models involving categorical predictors and their interactions.
Conditional Associations and Interactions
Suppose you have two binary predictors and you would like to estimate their interaction. You can compute the mean of your response variable for each of the four combinations of the levels of the predictors. From these four means, it is easy to characterize the association between one predictor and the outcome for a given value of the other predictor as the difference between the two means for that given value. It is also easy to characterize the interaction as the difference between these two differences.
For example, suppose I have conducted an experiment to test the effects of caffeine and background noise on math test performance. I randomly assign students to drink either regular coffee or decaf, and also to take a math test in a silent room or a room with distracting background noise. My three variables are math test score (numeric), caffeine (binary: yes or no), and noisy environment (binary: yes or no). The conditional effect of caffeine in the silent condition is simply the difference between the mean test score in the silent-decaf condition and the mean test score in the silent-caffeine condition. The conditional effect of caffeine in the noisy condition is the difference between the mean test score in the noisy-decaf condition and the mean test score in the noisy-caffeine condition. The interaction is the difference between those two conditional effects.
Main Effects
The highest-order interaction in any given model and its lowest-order conditional associations are relatively easy to define, and there is usually one prevailing definition. Main effects are not so simple.
In the above example, how should I define the overall main effect of caffeine? Is it the unweighted average (midpoint) of the conditional effects in the noisy and silent conditions? What if the noisy and silent conditions have different numbers of participants -- should the main effect then be the average of the conditional effects weighted by the number of participants in those conditions? Or should it be the slope associated with caffeine in a model that adjusts for the noise manipulation but omits the interaction term entirely, thereby minimizing the sum of squared residuals under the constraint that the conditional effects must be equal? Or should I define the main effect as the difference between the mean in the caffeine condition and the mean in the decaf condition, not accounting for the noise manipulation at all? All four of these values could be referred to as "the main effect of caffeine" depending on the context.
ANOVA can be thought of as a particular strategy for summarizing the results of one or more regression models. In order to better understand the different kinds of main effects, it helps to think in terms of these regression models.
Introducing an Example
I will set up a concrete example. You can run this code in R yourself to see the results, but you do not need to do so. I have included the relevant bits of output here.
dat = data.frame(
caffeine = c(rep(0,15),rep(1,9)),
noise = c(rep(0,5),rep(1,10),rep(0,5),rep(1,4)),
score = c(47,50,50,50,53, 33,36,37,37,39,41,43,43,44,47,
85,88,90,92,95, 46,50,50,54))
The above line of code generates a small data frame containing three vectors representing the variables from the conceptual example I described above. For caffeine and noise, 0 means "no" and 1 means "yes." Twenty-four hypothetical students took the math test, each under one of the four sets of conditions. For the sake of a clear illustration, the effect sizes are unrealistically large and disproportionately many participants ended up in the decaf-noise condition.
summary(lm(score~caffeine*noise,data=dat))
The above line displays a summary of a regression model testing the interaction between caffeine and noise. Here is the table of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.000 1.661 30.096 < 2e-16 ***
caffeine 40.000 2.349 17.025 2.28e-13 ***
noise -10.000 2.035 -4.915 8.36e-05 ***
caffeine:noise -30.000 3.217 -9.325 1.01e-08 ***
Caffeine's slope is 40, indicating that the conditional effect of caffeine in the silence condition was 40 points. In order words, caffeine-drinkers in a silent room scored 40 points higher than decaf-drinkers in a silent room. The interaction term's slope is -30, indicating that the conditional effect of caffeine in the noise condition would be 30 points lower than in the silence condition. If we computed a "silence" variable identical to the "noise" variable but with the 1s and 0s flipped, we could refit the model and caffeine's slope should be 10 instead of 40.
dat$silence = 1-dat$noise
summary(lm(score~caffeine*silence,data=dat))
Here is the table of coefficients showing the conditional effect of caffeine in the noise condition to be 10 points.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.000 1.175 34.050 < 2e-16 ***
caffeine 10.000 2.198 4.550 0.000195 ***
silence 10.000 2.035 4.915 8.36e-05 ***
caffeine:silence 30.000 3.217 9.325 1.01e-08 ***
Main Effects in Regression Models
Suppose I wanted to describe an overall main effect of caffeine. There are several ways I could do so.
First, I could simply compare the overall mean in the decaf condition to the overall mean in the caffeine condition, ignoring the noise manipulation entirely.
summary(lm(score~caffeine,data=dat))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.333 3.553 12.20 2.90e-11 ***
caffeine 28.889 5.801 4.98 5.53e-05 ***
That approach could be called a "main effect," but in this case it is not especially useful. If I am describing this main effect of caffeine in a context where I also intend to talk about the interaction with the noise manipulation, I should not just ignore the noise manipulation. Instead, I can fit a regression model with both predictors.
summary(lm(score~caffeine+noise,data=dat))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 58.000 3.211 18.065 2.84e-14 ***
caffeine 24.000 3.622 6.626 1.47e-06 ***
noise -22.000 3.557 -6.185 3.89e-06 ***
The main effect of caffeine, adjusting for the noise manipulation, was 24 points. This is, in my view, the most intuitive kind of main effect. It is traditionally called "Type II," but that name is needlessly confusing. This kind of main effect can be interpreted in the same way as any other slope from a regression model with multiple predictors. It can also be thought of as a "best guess" of the effect of caffeine given the provisional constraint that the effect must be the same in the noise condition and the silence condition. (These ideas are covered in greater depth in my tutorial on main effects vs. interaction in a regression model, 2020-12-18).
I am building these models knowing that there is a large interaction: the effect of caffeine in the noise condition (b = 10; see previous section) differs considerably from the effect of caffeine in the silence condition (b = 40). A good estimate of an "overall main effect" of caffeine should probably fall between these two conditional effects. Indeed, the slope of caffeine adjusting for noise is 24, which is clearly between 10 and 40. (This kind of slope will always fall between the two conditional slopes. The slope of caffeine ignoring noise completely will not, which is part of the reason I brushed that approach aside above.) But perhaps I would like to be more explicit about how I am generating a compromise between the two conditional effects. Perhaps I would like to specify where the "main effect" should fall between them.
In the previous section, I created two versions of the noise variable in order to obtain estimates of the effect of caffeine in the noise condition and the silence condition. The key difference between them was the reference category. I was able to interpret caffeine's slope at the point where the noise/silence variable was zero. I can use that same basic strategy to estimate caffeine's slope anywhere between the noise condition and the silence condition, as well.
To do this, I will create new versions of the caffeine and noise variables: centered versions and contrast-coded versions. To center these variables, I simply subtract the mean from each. (Be careful about missing data. I would not want to define a centered variable using a mean that included participants who would not be in my final model, say, because they have missing test scores. In this example there is no missing data at all.) As a result, the mean of "caffeine_centered" should be precisely zero. The values in the decaf condition (-.375) will be closer to zero than the values in the caffeine condition (.625) because there are more total values in the decaf condition.
dat$caffeine_centered = dat$caffeine - mean(dat$caffeine)
dat$noise_centered = dat$noise - mean(dat$noise)
To contrast code them, I subtract .5 from each, so that the values will be coded as -.5 and .5. (Traditionally, contrast coding involves setting the values to -1 and 1, but it makes no difference here.) As a result, zero falls precisely between the two conditions.
dat$caffeine_contrast = dat$caffeine - .5
dat$noise_contrast = dat$noise - .5
I can fit the regression model with the interaction using the centered versions of the predictors.
summary(lm(score~caffeine_centered*noise_centered,data=dat))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.6042 0.7766 67.738 < 2e-16 ***
caffeine_centered 22.5000 1.6130 13.949 9.12e-12 ***
noise_centered -21.2500 1.5781 -13.465 1.73e-11 ***
caffeine_centered:noise_centered -30.0000 3.2171 -9.325 1.01e-08 ***
The resulting slope for caffeine is 22.5. This is the average of the slope in the noise condition (b = 10) and the slope in the silence condition (b = 40), weighted by their sample sizes (14 in the noise condition, 10 in the silence condition). 40*10/24 + 10*14/24 = 22.5. The main effect of caffeine on math test scores was 22.5 points, on average, given that there were more people in the noise condition than the silence condition.
I can also fit the regression model with the interaction using the contrast-coded versions of the predictors.
summary(lm(score~caffeine_contrast*noise_contrast,data=dat))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.5000 0.8043 71.492 < 2e-16 ***
caffeine_contrast 25.0000 1.6086 15.542 1.25e-12 ***
noise_contrast -25.0000 1.6086 -15.542 1.25e-12 ***
caffeine_contrast:noise_contrast -30.0000 3.2171 -9.325 1.01e-08 ***
The resulting slope for caffeine is 25. This is the midpoint between the slope in the noise condition (b = 10) and the slope in the silence condition (b = 40). (10+40)/2 = 25. The main effect of caffeine, estimated directly between the noise and silence conditions, was 25 points.
Main Effects Framed in ANOVA Terms: "Type II"
Quite often, it is good enough to explain results by describing the specific regression models they come from, but many statistical analysis programs provide tools to present results in ANOVA terms as well. I often recommend using the "Anova" function from the "car" package. (Do not use the lowercase-a "anova" function from base R for this purpose.)
library(car)
Anova(lm(score~caffeine*noise,data=dat),type="II")
This function uses "Type II" by default, but you can always specify it as I have above to avoid confusion.
Sum Sq Df F value Pr(>F)
caffeine 3085.7 1 223.602 2.549e-12 ***
noise 2688.9 1 194.847 8.999e-12 ***
caffeine:noise 1200.0 1 86.957 1.011e-08 ***
Residuals 276.0 20
It is reasonable to think of these main effects as being similar to those from the regression model that omitted the interaction term in the previous section (copied below). However, you may notice that the p-values are not identical. Here, the main effect of caffeine has an F value of 223.602. The square root of that value is 14.95.
summary(lm(score~caffeine+noise,data=dat))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 58.000 3.211 18.065 2.84e-14 ***
caffeine 24.000 3.622 6.626 1.47e-06 ***
noise -22.000 3.557 -6.185 3.89e-06 ***
In this regression model, the main effect of caffeine has a t value of 6.63, not 14.95. What if I plug this model into the Anova function instead?
Anova(lm(score~caffeine+noise,data=dat),type="II")
Sum Sq Df F value Pr(>F)
caffeine 3085.7 1 43.902 1.467e-06 ***
noise 2688.9 1 38.257 3.894e-06 ***
Residuals 1476.0 21
The sum of squares for caffeine is precisely the same as in the model with the interaction, but the F value is not. Indeed, the square root of this F value is 6.63. This F-test is actually equivalent to the t-test in the regression summary above.
In other words, the "Anova" function with the interaction is using the same definition of a "main effect" as the regression model without the interaction, but comparing the variance explained by that effect to a smaller estimate of error variance. Specifically, it is using the error variance from the regression model with the interaction included.
summary(lm(score~caffeine*noise,data=dat))$sigma^2
This line of code retrieves the mean squared error from the full regression model with the interaction term. An F value is a ratio of variances. In this case, we are interested in the variance between the caffeine and decaf groups divided by the error variance. The sum of squares for caffeine is 3085.7; dividing it by the one degree of freedom does not change it. I can retrieve this 3085.7 value and divide it by the mean squared error from the full regression model:
Anova(lm(score~caffeine+noise,data=dat))[1,1] /
summary(lm(score~caffeine*noise,data=dat))$sigma^2
The result is 223.6, the same as the F value in the first ANOVA table in this section. This provides a test of the main effect of caffeine adjusting for noise, just like the regression model without the interaction term. The difference is that the regression model treats the variance that would be explained by the interaction as error variance (after all, it doesn't "know" that you are also planning to test an interaction).
To recap, we can treat the interaction as part of the error variance when testing the main effects adjusting for one another:
summary(lm(score~caffeine+noise,data=dat))
We can also partition out the variance that would be explained by the interaction when testing these same main effects:
Anova(lm(score~caffeine*noise,data=dat))
Which is better? The first option can be described as a single regression model, so it strikes me as more parsimonious. It is also more conservative (except in fringe cases). However, the interaction is not really "unexplained" variance, since this whole procedure implies that we do know about it, so treating it as error variance could be seen as overly conservative. In the end, both approaches are reasonable. It is worth noting that I designed this example to call attention to the differences between these methods. This interaction is way bigger than most real ones, and the error variance in this model is tiny. Most of the time, these two ways of testing "Type II" main effects are nearly identical, because the interaction usually represents only a very small portion of the "unexplained" variance.
In short, "Type II" main effects can be thought of as effects of each predictor adjusting for the other, like a regression model with both predictors but without the interaction.
Main Effects Framed in ANOVA Terms: "Type III"
As I wrote earlier, reasonable definitions of main effects typically locate them between the relevant conditional effects. Sometimes it makes sense to be explicit about precisely where the main effect should be estimated, such as halfway between the conditional effects or closer to the conditional effect with the larger sample size. "Type III" main effects are designed to accomplish this.
If I plug the overall regression model into the "Anova" function and specify "Type III," I will not get main effects, but I will get a clue about what "Type III" means for this function.
summary(lm(score~caffeine*noise,data=dat))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.000 1.661 30.096 < 2e-16 ***
caffeine 40.000 2.349 17.025 2.28e-13 ***
noise -10.000 2.035 -4.915 8.36e-05 ***
caffeine:noise -30.000 3.217 -9.325 1.01e-08 ***
Anova(lm(score~caffeine*noise,data=dat),type="III")
Sum Sq Df F value Pr(>F)
(Intercept) 12500.0 1 905.797 < 2.2e-16 ***
caffeine 4000.0 1 289.855 2.280e-13 ***
noise 333.3 1 24.155 8.364e-05 ***
caffeine:noise 1200.0 1 86.957 1.011e-08 ***
Residuals 276.0 20
These sets of hypothesis tests are equivalent. The conditional effect of caffeine in the silence condition is 40 points, with a t value of 17.03 and an F value of 17.03^2 = 290. "Type III" (for this function) simply yields F-tests of the very same terms that are explicitly included in the regression model. In this case, those are conditional effects, not main effects.
I can take advantage of this feature to force the Anova function to estimate the effect of caffeine at the mean of the noise variable:
Anova(lm(score~caffeine_centered*noise_centered,data=dat),type="III")
Sum Sq Df F value Pr(>F)
(Intercept) 63321 1 4588.472 < 2.2e-16 ***
caffeine_centered 2685 1 194.571 9.116e-12 ***
noise_centered 2502 1 181.316 1.730e-11 ***
caffeine_centered:noise_centered 1200 1 86.957 1.011e-08 ***
Residuals 276 20
This corresponds to the main effect of caffeine on math test scores, on average, given that there were more people in the noise condition than the silence condition (see "Main Effects in Regression Models" section above; b = 22.5).
I can also estimate the effect of caffeine halfway between the noise condition and the silence condition:
Anova(lm(score~caffeine_contrast*noise_contrast,data=dat),type="III")
Sum Sq Df F value Pr(>F)
(Intercept) 70533 1 5111.111 < 2.2e-16 ***
caffeine_contrast 3333 1 241.546 1.249e-12 ***
noise_contrast 3333 1 241.546 1.249e-12 ***
caffeine_contrast:noise_contrast 1200 1 86.957 1.011e-08 ***
Residuals 276 20
This corresponds to the main effect of caffeine, estimated directly between the noise and silence conditions (see "Main Effects in Regression Models" section above; b = 25).
What does SPSS do?
Although it is useful to know about different types of main effects in general, the topic often comes up in the context of comparing one or more of the above approaches to the SPSS univariate ANOVA function (UNIANOVA, available via Analyze / General Linear Model / Univariate).
This SPSS function defaults to "Type III" main effects with contrast coding, the very last approach I described at the end of the above section.
Summary and Conclusion
"Type II" main effects are tested adjusting for one another. In other words, this approach is similar to fitting one regression model with only main effects, saving the interaction for a separate model. This is my preferred approach, and also the default for the "Anova" function from the "car" package. "Type III" main effects are tested in the same regression model as the interaction, meaning that they are actually conditional effects ("simple effects"). Some implementations automatically contrast code or center your predictors to estimate these conditional effects at useful points (the average of the two conditions, possibly weighted by sample size). The default in SPSS is "Type III" with contrast coding. The "Anova" function from the "car" package has no such default, so the user must set up the desired main effects in the regression model passed to the function.
With all this talk of "Type II" and "Type III," you may wonder what "Type I" is. "Type I" tests terms in a fixed order, so that one main effect is tested on its own, the second main effect is tested adjusting for the first, and so forth. I recommend never using this approach, as most psychologists will find it highly counterintuitive.
The terms "Type II" and "Type III" are a nuisance. They confuse people and clarify little. If you are comfortable with linear regression models, you can always describe those models directly and avoid referring to "Type II" or "Type III" at all. However, it is often helpful to use the "Anova" function from the "car" package to get reasonable ("Type II") main effects without having to set up several regression models.
In this document, I have relied on an example featuring a two-way interaction and two main effects. These ideas generalize to more complex models. For example, when you have a three-way interaction, there are multiple ways to define overall two-way interactions, corresponding to the methods for defining overall main effects above. Also, all of the methods (except arguably "Type III" with contrast coding) generalize to interactions involving numeric predictors as well.