Random effects for control predictors

One of the most compelling aspects of mixed-effects models is the ability to include almost any control predictor—by which we mean a property of an experimental trial which may affect the response variable but is not of theoretical interest in a given analysis—desired by the researcher. In principle, including control variables in an analysis can rule out potential confounds and increase statistical power by reducing residual noise. Given the investigations in the present paper, however, the question naturally arises: in order to guard against anti-conservative inference about a predictor \(X\) of theoretical interest, do we need by-subject and by-item random effects for all our control predictors \(C\) as well? Suppose, after all, if there is no underlying fixed effect of \(C\) but there is a random effect of \(C\)—could this create anti-conservative inference in the same way as omitting a random effect of \(X\) in the analysis could? To put this issue in perspective via an example, Kuperman, Bertram, & Baayen (2010) present an LME analysis of fixation durations in Dutch reading; for the interpretation of each main effect, the other seven may be thought of as serving as controls. The prospect of trying to fit eight random effects (plus correlation terms!) both by subjects and by items no doubt makes some readers cringe; and many studies include far more than eight predictors!

Fortunately, we have reassuring news on this front: omitting random effects for control predictors should not generally lead to anti-conservativity, so long as random effects are incldued for predictors of theoretical interest. To see the logic, consider: either a set of control predictors \(C\) and the theoretically interesting predictor \(X\) are multicollinear (a generalization of correlation, meaning that \(X\) can be predicted accurately from \(C\)) or they are not. If they are not, then even if \(C\) affects the response there will be no tendency to generate spurious effects of \(X\). If \(C\) and \(X\) are multicollinear and there is an underlying fixed effect of \(C\) on the response, then including a fixed effect of \(C\) in the analysis is enough (omitting \(C\) would of course make inference about \(X\) anti-conservative). If \(C\) and \(X\) are multicollinear and there is a random effect of \(C\) on the response, then the random effect of \(X\) itself is enough to avoid anti-conservative inference! To demonstrate the validity of this point, we conducted Monte Carlo simulations of 24-subject, 24-item within/within experiments with a two-level treatment \(X\) and a continuous control factor \(C\) correlated with \(X\). In the generative model there was always a random effect but no fixed effect of \(X\). In all analyses we compared behavior of LME analyses with and without random slopes for C.1

With a true fixed effect but no random effect of \(C\), we find Type I error rates at \(\alpha=0.05\) are slightly above nominal, consistent with our general results; Type I error is also slightly higher for analyses without control-predictor random slopes (error rate of 0.057) than for analyses with control-predictor random slopes (0.054). However, Type I error rates are similarly higher when there is no true effect of \(C\) whatsoever for analyses without control-predictor random effects (error rate of 0.060) than for analyses with them (0.057)! When there is a true random effect but no fixed effect of \(C\), we find that analyses without random effects of \(C\) are actually conservative (Type I error rate of 0.030), whereas analyses with random effects of \(C\) are slightly above nominal (0.056). The reason for this conservativity is that the LME model can only account for the cluster-specific variation seen in \(C\) by attributing it to large random effects of \(X\); but large estimated random effects of \(X\) reduce confidence in any possible fixed effect of \(X\), leading to conservative inference.

Hence random effects for control predictors are not strictly necessary to avoid anti-conservative inference. However, this analysis underscores a different point: failing to add random effects for controls may in fact rob the investigator of the opportunity to considerably sharpen inferences about predictors of theoretical interest, since control-predictor random effects may potentially be needed to soak up important sources of noise in one's data.


Kuperman, V., Bertram, R., & Baayen, R. H. (2010). Processing trade-offs in the reading of Dutch derived words. Journal of Memory and Language, 62, 83–97.


1 In these simulations \(C\) was distributed normally with standard deviation of 0.5 and means of 1 and 2 for the two levels of the experimental treatment respectively. When there was a fixed effect of \(C\) its value was \(\beta_C=5\). By-subjects and by-items covariance matrices were \(\mathcal{I}_2\) for cases with no random effect of \(C\) and \(\mathcal{I}_3\) for cases with a random effect of \(C\), trial-level error standard deviation of 2. Analysis specifications were response ~ A + C + (A | subj) + (A | item) versus response ~ A + C + (A + C | subj) + (A + C | item).

Author: Dale J. Barr, Roger Levy, Christoph Scheepers, and Harry J. Tily (daleb@daleb-pc)

Date: April 23, 2012

Generated by Org version 7.8.06 with Emacs version 23

Validate XHTML 1.0