Effect of sample size on performance

Fixed effects and their standard errors for LME models are computed based on maximum-likelihood estimates of the random effects covariance matrix. We believe that maximal-LME fixed-effects inferences remain slightly anti-conservative because maximum-likelihood estimation is biased to underestimate the size of random-effects variances, which deflates fixed-effects standard errors. If this is the case, then we should find that Type I error becomes increasingly nominal in maximal LME analyses as the number of subjects and/or items increases and certainty in the estimate of the random-effects covariance matrix correspondingly increases. In a set of informal simulations, we found that this was indeed the case.


Type I error as a function of the number of items in a within/within experiment

We see signs of this in the results presented in the main paper, where Type I error rates are consistently higher for 12-item designs than for 24-item designs. The current analysis extends this line of reasoning. In the above figure, Type I error rates are plotted as a function of the number of items. For maximal LME, anticonservativity disappears quickly as the number of items increases. This behavior contrasts with that of min-\(F'\), which is conservative across the board; F1+F2, which is conservative for few items but switches to being slightly anti-conservative as the number of items increases; and random-intercepts LME, which becomes more and more anticonservative with more items.

Of course, we do not always have the luxury of using a large number of items and subjects in our analyses. This anti-conservativity in the face of a limited number of clusters and corresponding uncertainty in the random effects is in fact exactly the kind of problem that the use of Bayesian inference and Markov-chain Monte Carlo on the fixed effects is intended to address (Baayen, Davidson, & Bates, 2008; Gelman & Hill, 2007). Unfortunately, these techniques are not yet available out of the box for random-slopes models in any LME implementation we are aware of, but we hope that they become readily available in the future and that they eliminate the anti-conservativity observed in the present simulations.1


Baayen, R. H., Davidson, D. J., and Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390-412.

Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press.

Plummer, Martyn (2003). JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20–22, Vienna, Austria. ISSN 1609-395X.


1 Random-slope models can be implemented and run from R on any platform using the software package JAGS (Plummer, 2003) and the authors have done so in some of their own analyses, but the process remains sufficiently time-consuming and error-prone that we do not roamed the practice at this point.

Author: Dale J. Barr, Roger Levy, Christoph Scheepers, and Harry J. Tily (daleb@daleb-pc)

Date: March 27, 2012

Generated by Org version 7.8.06 with Emacs version 23

Validate XHTML 1.0