# Random effects for control predictors

One of the most compelling aspects of mixed-effects models is the ability to include almost any control predictor—by which we mean a property of an experimental trial which may affect the response variable but is not of theoretical interest in a given analysis—desired by the researcher. In principle, including control variables in an analysis can rule out potential confounds and increase statistical power by reducing residual noise. Given the investigations in the present paper, however, the question naturally arises: in order to guard against anti-conservative inference about a predictor \(X\) of theoretical interest, do we need by-subject and by-item random effects for all our control predictors \(C\) as well? Suppose, after all, if there is no underlying fixed effect of \(C\) but there is a random effect of \(C\)—could this create anti-conservative inference in the same way as omitting a random effect of \(X\) in the analysis could? To put this issue in perspective via an example, Kuperman, Bertram, & Baayen (2010) present an LME analysis of fixation durations in Dutch reading; for the interpretation of each main effect, the other seven may be thought of as serving as controls. The prospect of trying to fit eight random effects (plus correlation terms!) both by subjects and by items no doubt makes some readers cringe; and many studies include far more than eight predictors!

Fortunately, we have reassuring news on this front: omitting random
effects for control predictors should not generally lead to
anti-conservativity, so long as random effects are incldued for
predictors of theoretical interest. To see the logic, consider:
either a set of control predictors \(C\) and the theoretically
interesting predictor \(X\) are multicollinear (a generalization of
correlation, meaning that \(X\) can be predicted accurately from \(C\)) or
they are not. If they are not, then even if \(C\) affects the response
there will be no tendency to generate spurious effects of \(X\). If \(C\)
and \(X\) are multicollinear and there is an underlying fixed effect of
\(C\) on the response, then including a fixed effect of \(C\) in the
analysis is enough (omitting \(C\) would of course make inference about
\(X\) anti-conservative). If \(C\) and \(X\) are multicollinear and there
is a random effect of \(C\) on the response, then the random effect of
\(X\) itself is enough to avoid anti-conservative inference! To
demonstrate the validity of this point, we conducted Monte Carlo
simulations of 24-subject, 24-item within/within experiments with a
two-level treatment \(X\) and a continuous control factor \(C\) correlated
with \(X\). In the generative model there was always a random effect
but no fixed effect of \(X\). In all analyses we compared behavior of
LME analyses with and without random slopes for C.^{1}

With a true fixed effect but no random effect of \(C\), we find Type I
error rates at \(\alpha=0.05\) are slightly above nominal, consistent
with our general results; Type I error is also slightly higher for
analyses without control-predictor random slopes (error rate of 0.057)
than for analyses with control-predictor random slopes (0.054).
However, Type I error rates are similarly higher when there is no true
effect of \(C\) whatsoever for analyses without control-predictor random
effects (error rate of 0.060) than for analyses with them (0.057)!
When there is a true random effect but no fixed effect of \(C\), we find
that analyses without random effects of \(C\) are actually
*conservative* (Type I error rate of 0.030), whereas analyses with
random effects of \(C\) are slightly above nominal (0.056). The reason
for this conservativity is that the LME model can only account for the
cluster-specific variation seen in \(C\) by attributing it to large
random effects of \(X\); but large estimated random effects of \(X\)
reduce confidence in any possible fixed effect of \(X\), leading to
conservative inference.

Hence random effects for control predictors are not strictly necessary to avoid anti-conservative inference. However, this analysis underscores a different point: failing to add random effects for controls may in fact rob the investigator of the opportunity to considerably sharpen inferences about predictors of theoretical interest, since control-predictor random effects may potentially be needed to soak up important sources of noise in one's data.

## References

Kuperman, V., Bertram, R., & Baayen, R. H. (2010). Processing trade-offs in the reading of Dutch derived words. *Journal of Memory and Language*, *62*, 83–97.

## Footnotes:

^{1} In these
simulations \(C\) was distributed normally with standard deviation of
0.5 and means of 1 and 2 for the two levels of the experimental
treatment respectively. When there was a fixed effect of \(C\) its
value was \(\beta_C=5\). By-subjects and by-items covariance matrices
were \(\mathcal{I}_2\) for cases with no random effect of \(C\) and
\(\mathcal{I}_3\) for cases with a random effect of \(C\), trial-level
error standard deviation of 2. Analysis specifications were `response ~ A + C + (A | subj) + (A | item)`

versus `response ~ A + C + (A + C | subj) + (A + C | item)`

.