Many experimentalists who are trying to make the leap from ANOVA to linear mixed-effects models (LMEMs) in R struggle with the coding of categorical predictors. It is unexpectedly complicated, and the defaults provided in R turn out to be wholly inappropriate for factorial experiments. Indeed, using those defaults with factorial experiments can lead researchers to draw erroneous conclusions from their data.

To keep things simple, we’ll start with situations where design factors have no more than two levels before moving on to designs with more than three levels.

Below, I will assume that you understand the difference between a **simple effect** and a **main effect**, and between a **simple interaction** and a **main interaction** in a three-way design. If you already do, just skip to the next section.

In an \(A{\times}B\) design, the simple effect of \(A\) is the effect of \(A\) **controlling** for \(B\), while the main effect of \(A\) is the effect of \(A\) **ignoring** \(B\). Another way of looking at this is to consider the cell means (\(\bar{Y}_{11}\), \(\bar{Y}_{12}\), \(\bar{Y}_{21}\), and \(\bar{Y}_{22}\)) and marginal means (\(\bar{Y}_{1.}\), \(\bar{Y}_{2.}\), \(\bar{Y}_{.1}\), and \(\bar{Y}_{.2}\)) in a factorial design. (The dot subscript tells you to “ignore” the dimension containing the dot; e.g., \(\bar{Y}_{.1}\) tells you to take the mean of the first column ignoring the row variable.) To test the main effect of A is to test the null hypothesis that \(\bar{Y}_{1.}=\bar{Y}_{2.}\). To test a simple effect of \(A\)—the effect of \(A\) at a particular level of \(B\)—would be, for instance, to test the null hypothesis that \(\bar{Y}_{11}=\bar{Y}_{21}\).

\(B_1\) | \(B_2\) | |||
---|---|---|---|---|

\(A_1\) | \(\bar{Y}_{11}\) | \(\bar{Y}_{12}\) | \(\bar{Y}_{1.}\) | |

\(A_2\) | \(\bar{Y}_{21}\) | \(\bar{Y}_{22}\) | \(\bar{Y}_{2.}\) | |

\(\bar{Y}_{.1}\) | \(\bar{Y}_{.2}\) |

The distinction between **simple interactions** and **main interactions** has the same logic: the simple interaction of \(AB\) in an \(ABC\) design is the interaction of \(AB\) at a particular level of \(C\); the main interaction of \(AB\) is the interaction **ignoring** C. The latter is what we are usually talking about when we talk about lower-order interactions in a three-way design. It is also what we are given in the output from standard ANOVA procedures, e.g., the `aov()`

function in R, SPSS, SAS, etc.

Generally, the choice of a coding scheme impacts the interpretation of:

- the intercept term; and
- the interpretation of the tests for all but the highest-order effects and interactions in a factorial design.

It also can influence the interpretation/estimation of random effects in a mixed-effects model (see here for further discussion). If you have a design with only a single two-level factor, and are using a maximal random-effects structure, the choice of coding scheme doesn’t really matter.

There are many possible coding schemes (see `?contr.treatment`

for more information). The most relevant ones are **treatment**, **sum**, and **deviation**. Sum and deviation coding can be seen as special cases of **effect** coding; by effect coding, people generally mean codes that sum to zero.

For a two-level factor, you would use the following codes:

Scheme | \(A_1\) | \(A_2\) |
---|---|---|

Treatment (dummy) | \(0\) | \(1\) |

Sum | \(-1\) | \(1\) |

Deviation | \(-\frac{1}{2}\) | \(\frac{1}{2}\) |

The default in R is to use treatment coding for any variable defined as a `factor`

in the model (see `?factor`

and `?contrasts`

for information). To see why this is not ideal for factorial designs, consider a 2x2x2 factorial design with factors \(A\), \(B\) and \(C\). We will just consider a fully between-subjects design with only one observation per subject as this allows us to use the simplest possible error structure. We would fit such a model using `lm()`

:

`lm(Y ~ A * B * C)`

The table below provides the interpretation for various effects in the model under the three different coding schemes. Note that \(Y\) is the dependent variable, and the dots in the subscript mean to “ignore” the corresponding dimension. Thus, \(\bar{Y}_{.1.}\) is the mean of B_1 (ignoring factors \(A\) and \(C\)) and \(\bar{Y}_{...}\) is the “grand mean” (ignoring all factors).

term | treatment | sum | deviation |
---|---|---|---|

\(\mu\) | \(\bar{Y}_{111}\) | \(\bar{Y}_{...}\) | \(\bar{Y}_{...}\) |

\(A\) | \(\bar{Y}_{211} - \bar{Y}_{111}\) | \(\frac{(\bar{Y}_{2..} - \bar{Y}_{1..})}{2}\) | \(\bar{Y}_{2..} - \bar{Y}_{1..}\) |

\(B\) | \(\bar{Y}_{121} - \bar{Y}_{111}\) | \(\frac{(\bar{Y}_{.2.} - \bar{Y}_{.1.})}{2}\) | \(\bar{Y}_{.2.} - \bar{Y}_{.1.}\) |

\(C\) | \(\bar{Y}_{112} - \bar{Y}_{111}\) | \(\frac{(\bar{Y}_{..2} - \bar{Y}_{..1})}{2}\) | \(\bar{Y}_{..2} - \bar{Y}_{..1}\) |

\(AB\) | \((\bar{Y}_{221} - \bar{Y}_{121}) - (\bar{Y}_{211} - \bar{Y}_{111})\) | \(\frac{(\bar{Y}_{22.} - \bar{Y}_{12.}) - (\bar{Y}_{21.} - \bar{Y}_{11.})}{4}\) | \((\bar{Y}_{22.} - \bar{Y}_{12.}) - (\bar{Y}_{21.} - \bar{Y}_{11.})\) |

\(AC\) | \((\bar{Y}_{212} - \bar{Y}_{211}) - (\bar{Y}_{112} - \bar{Y}_{111})\) | \(\frac{(\bar{Y}_{2.2} - \bar{Y}_{1.2}) - (\bar{Y}_{2.1} - \bar{Y}_{1.1})}{4}\) | \((\bar{Y}_{2.2} - \bar{Y}_{1.2}) - (\bar{Y}_{2.1} - \bar{Y}_{1.1})\) |

\(BC\) | \((\bar{Y}_{122} - \bar{Y}_{112}) - (\bar{Y}_{121} - \bar{Y}_{111})\) | \(\frac{(\bar{Y}_{.22} - \bar{Y}_{.12}) - (\bar{Y}_{.21} - \bar{Y}_{.11})}{4}\) | \((\bar{Y}_{.22} - \bar{Y}_{.12}) - (\bar{Y}_{.21} - \bar{Y}_{.11})\) |

For the three way \(A \times B \times C\) interaction:

scheme | interpretation |
---|---|

treatment | \(\displaystyle\left[\displaystyle\left(\bar{Y}_{221} - \bar{Y}_{121}\right) - \displaystyle\left(\bar{Y}_{211} - \bar{Y}_{111}\right)\right] - \displaystyle\left[\displaystyle\left(\bar{Y}_{222} - \bar{Y}_{122}\right) - \displaystyle\left(\bar{Y}_{212} - \bar{Y}_{112}\right)\right]\) |

sum | \(\frac{\displaystyle\left[\displaystyle\left(\bar{Y}_{221} - \bar{Y}_{121}\right) - \displaystyle\left(\bar{Y}_{211} - \bar{Y}_{111}\right)\right] - \displaystyle\left[\displaystyle\left(\bar{Y}_{222} - \bar{Y}_{122}\right) - \displaystyle\left(\bar{Y}_{212} - \bar{Y}_{112}\right)\right]}{8}\) |

deviation | \(\displaystyle\left[\displaystyle\left(\bar{Y}_{221} - \bar{Y}_{121}\right) - \displaystyle\left(\bar{Y}_{211} - \bar{Y}_{111}\right)\right] - \displaystyle\left[\displaystyle\left(\bar{Y}_{222} - \bar{Y}_{122}\right) - \displaystyle\left(\bar{Y}_{212} - \bar{Y}_{112}\right)\right]\) |

Note that the inferential tests of \(A \times B \times C\) will all have the same outcome, despite the parameter estimate for sum coding being one-eighth of that for the other schemes. For all lower-order effects, sum and deviation coding will give different parameter estimates but identical inferential outcomes. Both of these schemes provide identical tests of the canonical main effects and main interactions for a three-way ANOVA. In contrast, treatment (dummy) coding will provide inferential tests of simple effects and simple interactions. So, if what you are interested in getting are the “canonical” tests from ANOVA, use sum or deviation coding.

A factor with \(k\) levels requires \(k-1\) variables. Each predictor contrasts a particular “target” level of the factor with a level that you (arbitrarily) choose as the “baseline” level. For instance, for a three-level factor \(A\) with \(A1\) chosen as the baseline, you’d have two predictor variables, one of which compares \(A2\) to \(A1\) and the other of which compares \(A3\) to \(A1\).

For treatment (dummy) coding, the target level is set to 1, otherwise 0.

For sum coding, the levels must sum to zero, so for a given predictor, the target level is given the value 1, the baseline level is given the value -1, and any other level is given the value 0.

For deviation coding, the values must also sum to 0. Deviation coding is recommended whenever you are trying to draw ANOVA-style inferences. Under this scheme, the target level gets the value \(\frac{k-1}{k}\) while any non-target level gets the value \(-\frac{1}{k}\).

**Fun fact**: Mean-centering treatment codes (on balanced data) will give you deviation codes.

level | A2v1 | A3v1 |
---|---|---|

A1 | 0 | 0 |

A2 | 1 | 0 |

A3 | 0 | 1 |

level | A2v1 | A3v1 |
---|---|---|

A1 | -1 | -1 |

A2 | 1 | 0 |

A3 | 0 | 1 |

level | A2v1 | A3v1 |
---|---|---|

A1 | \(-\frac{1}{3}\) | \(-\frac{1}{3}\) |

A2 | \(\frac{2}{3}\) | \(-\frac{1}{3}\) |

A3 | \(-\frac{1}{3}\) | \(\frac{2}{3}\) |

level | A2v1 | A3v1 | A4v1 | A5v1 |
---|---|---|---|---|

A1 | 0 | 0 | 0 | 0 |

A2 | 1 | 0 | 0 | 0 |

A3 | 0 | 1 | 0 | 0 |

A4 | 0 | 0 | 1 | 0 |

A5 | 0 | 0 | 0 | 1 |

level | A2v1 | A3v1 | A4v1 | A5v1 |
---|---|---|---|---|

A1 | -1 | -1 | -1 | -1 |

A2 | 1 | 0 | 0 | 0 |

A3 | 0 | 1 | 0 | 0 |

A4 | 0 | 0 | 1 | 0 |

A5 | 0 | 0 | 0 | 1 |

level | A2v1 | A3v1 | A4v1 | A5v1 |
---|---|---|---|---|

A1 | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) |

A2 | \(\frac{4}{5}\) | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) |

A3 | \(-\frac{1}{5}\) | \(\frac{4}{5}\) | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) |

A4 | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) | \(\frac{4}{5}\) | \(-\frac{1}{5}\) |

A5 | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) | \(-\frac{1}{5}\) | \(\frac{4}{5}\) |

You can change contrasts for any variable of type `factor`

using the `contrasts()`

or `C()`

function. Why bother creating your own numerical predictor variables?

It turns out that variables of type `factor`

behave in strange ways in linear models. (You’ll soon discover this if you try dropping out a main effect in a model with an interaction in order to perform a significance test.) So, take control of your own destiny and make your own numerical predictor variables. There are other ways, but here’s one approach that gets the job done. (Note: this approach uses functions in the tidyverse package.)

Let’s assume that your data is contained in a table `dat`

like the one below.

```
## make an example table
dat <- tibble(Y = rnorm(12),
A = rep(paste0("A", 1:3), each = 4))
```

Y | A |
---|---|

-0.52 | A1 |

0.60 | A1 |

-0.18 | A1 |

1.08 | A1 |

-0.04 | A2 |

-1.47 | A2 |

-0.07 | A2 |

-0.57 | A2 |

1.49 | A3 |

1.66 | A3 |

-0.33 | A3 |

1.65 | A3 |

`mutate()`

/ `if_else()`

/ `case_when()`

approach for a three-level factor```
dat_treat <- dat %>%
mutate(A2v1 = if_else(A == "A2", 1L, 0L),
A3v1 = if_else(A == "A3", 1L, 0L))
```

```
## # A tibble: 12 x 4
## Y A A2v1 A3v1
## <dbl> <chr> <int> <int>
## 1 -0.518 A1 0 0
## 2 0.603 A1 0 0
## 3 -0.176 A1 0 0
## 4 1.08 A1 0 0
## 5 -0.0393 A2 1 0
## 6 -1.47 A2 1 0
## 7 -0.0678 A2 1 0
## 8 -0.569 A2 1 0
## 9 1.49 A3 0 1
## 10 1.66 A3 0 1
## 11 -0.326 A3 0 1
## 12 1.65 A3 0 1
```

```
dat_sum <- dat %>%
mutate(A2v1 = case_when(A == "A1" ~ -1L, # baseline
A == "A2" ~ 1L, # target
TRUE ~ 0L), # anything else
A3v1 = case_when(A == "A1" ~ -1L, # baseline
A == "A3" ~ 1L, # target
TRUE ~ 0L)) # anything else
```

```
## # A tibble: 12 x 4
## Y A A2v1 A3v1
## <dbl> <chr> <int> <int>
## 1 -0.518 A1 -1 -1
## 2 0.603 A1 -1 -1
## 3 -0.176 A1 -1 -1
## 4 1.08 A1 -1 -1
## 5 -0.0393 A2 1 0
## 6 -1.47 A2 1 0
## 7 -0.0678 A2 1 0
## 8 -0.569 A2 1 0
## 9 1.49 A3 0 1
## 10 1.66 A3 0 1
## 11 -0.326 A3 0 1
## 12 1.65 A3 0 1
```

```
# baseline A1
dat_dev <- dat %>%
mutate(A2v1 = if_else(A == "A2", 2/3, -1/3), # target A2
A3v1 = if_else(A == "A3", 2/3, -1/3)) # target A3
```

```
## # A tibble: 12 x 4
## Y A A2v1 A3v1
## <dbl> <chr> <dbl> <dbl>
## 1 -0.518 A1 -0.333 -0.333
## 2 0.603 A1 -0.333 -0.333
## 3 -0.176 A1 -0.333 -0.333
## 4 1.08 A1 -0.333 -0.333
## 5 -0.0393 A2 0.667 -0.333
## 6 -1.47 A2 0.667 -0.333
## 7 -0.0678 A2 0.667 -0.333
## 8 -0.569 A2 0.667 -0.333
## 9 1.49 A3 -0.333 0.667
## 10 1.66 A3 -0.333 0.667
## 11 -0.326 A3 -0.333 0.667
## 12 1.65 A3 -0.333 0.667
```

**The interpretation of all but the highest order effect depends on the coding scheme.**

With treatment coding, you are looking at **simple** effects and **simple** interactions, not **main** effects and **main** interactions.

**The parameter estimates for sum coding differs from deviation coding only in the magnitude of the parameter estimates, but have identical interpretations.**

Because it is not subject to the scaling effects seen under sum coding, deviation should be used by default for ANOVA-style designs.

**The default coding scheme for factors is R is “treatment” coding.**

So, anytime you declare a variable as type `factor`

and use this variable as a predictor in your regression model, R will automatically create treatment-coded variables.

**Take-home message: when analyzing factorial designs in R using regression, to obtain the canonical ANOVA-style interpretations of main effects and interactions use deviation coding and NOT the default treatment coding.**