7 Answers to Selected Chapter 7 Exercises
Multilevel models, and repeated measures
library(DAAG)
The final two sentences of Exercise 1 are challenging!
Exercises 1 & 2 should be asterisked.
Exercise 1
Repeat the calculations of Subsection 7.4.5, but omitting results from two vines at random. Here is code that will handle the calculation:
<- 2
n.omit <- rep(TRUE, 48)
take sample(1:48,2)] <- FALSE
take[<- lmer(yield ~ shade + (1|block) + (1|block:plot),
kiwishade.lmer data = kiwishade,subset=take)
<- show(VarCorr(kiwishade.lmer))
vcov <- vcov[, "Groups"]
gps print(vcov[gps=="block:plot", "Variance"])
print(vcov[gps=="Residual", "Variance"])
Repeat this calculation five times, for each of n.omit
= 2, 4, 6, 8, 10, 12 and 14. Plot (i) the plot component of variance and (ii) the vine component of variance, against number of points omitted. Based on these results, for what value of n.omit
does the loss of vines begin to compromise results? Which of the two components of variance estimates is more damaged by the loss of observations? Comment on why this is to be expected.
For convenience, we place the central part of the calculation in a function. On slow machines, the code may take a minute or two to run.
library(lme4)
Loading required package: Matrix
<- function(n.omit=2)
trashvine
{<- k+1
k <- n.omit
n[k] <- rep(T, 48)
take sample(1:48, n.omit)] <- F
take[$take <- take
kiwishade<- lmer(yield ~ shade + (1 | block) + (1|block:plot),
kiwishade.lmer data = kiwishade, subset=take)
<- as.numeric(attr(VarCorr(kiwishade.lmer), "sc")^2)
varv <- as.numeric(VarCorr(kiwishade.lmer)$`block:plot`)
varp c(varp, varv)
}<- numeric(35)
varp <- numeric(35)
varv <- numeric(35)
n <- 0
k for(n.omit in c( 2, 4, 6, 8, 10, 12, 14))
for(i in 1:5){
<- k+1
k <- trashvine(n.omit=n.omit)
vec2 <- n.omit
n[k] <- vec2[1]
varp[k] <- vec2[2]
varv[k] }
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
The following show within, and between, plots variance estimates as functions of the number of vines that were omitted at random
As the number of vines that are omitted increases, the variance estimates can be expected to show greater variability. The effect should be most evident on the between plot variance. Inaccuracy in estimates of the between plot variance arise both from inaccuracy in the within plot sums of squares and from loss of information at the between plot level.
At best it is possible only to give an approximate d.f. for the between plot estimate of variance (some plots lose more vines than others), which complicates any evaluation that relies on degree of freedom considerations.
Exercise 2
Repeat the previous exercise, but now omitting 1, 2, 3, 4 complete plots at random.
<- function(n.omit=2)
trashplot
{<- k+1
k <- n.omit
n[k] <- levels(kiwishade$plot)
plotlev <- sample(plotlev, length(plotlev)-n.omit)
use.lev $take <- kiwishade$plot %in% use.lev
kiwishade<- lmer(yield ~ shade + (1 | block) + (1|block:plot),
kiwishade.lmer data = kiwishade, subset=take)
<- as.numeric(attr(VarCorr(kiwishade.lmer), "sc")^2)
varv <- as.numeric(VarCorr(kiwishade.lmer)$`block:plot`)
varp c(varp, varv)
}<- numeric(20)
varp <- numeric(20)
varv <- numeric(20)
n <- 0
k for(n.omit in 1:4)
for(i in 1:5){
<- k+1
k <- trashplot(n.omit=n.omit)
vec2 <- n.omit
n[k] <- vec2[1]
varp[k] <- vec2[2]
varv[k] }
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
Again, we plot the results. Plots show within, and between, plots variance estimates as functions of the number of whole plots (each consisting of four vines) that were omitted at random.
Omission of a whole plot loses 3 d.f. out of 36 for estimation of within plot effects, and 1 degree of freedom out of 11 for the estimation of between plot effects, i.e., a slightly greater relative loss. The effect on precision will be most obvious where the d.f.
are already smallest, i.e., for the between plot variance. The loss of information on complete plots is inherently for serious, for the estimation of the between plot variance, than the loss of partial information (albeit on a greater number of plots) as will often happen in Exercise 1.
Exercise 2
The data set Gun
(MEMSS package) reports on the numbers of rounds fired per minute, by each of nine teams of gunners, each tested twice using each of two methods. In the nine teams, three were made of men with slight build, three with average, and three with heavy build. Is there a detectable difference, in number of rounds fired, between build type or between firing methods? For improving the precision of results, which would be better – to double the number of teams, or to double the number of occasions (from 2 to 4) on which each team tests each method?
It does not make much sense to look for overall differences in Method
; this depends on Physique
. We therefore nest Method
within Physique
.
library(MEMSS)
Attaching package: 'MEMSS'
The following objects are masked from 'package:datasets':
CO2, Orange, Theoph
<- lmer(rounds~Physique/Method +(1|Team), data=Gun)
Gun.lmer summary(Gun.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: rounds ~ Physique/Method + (1 | Team)
Data: Gun
REML criterion at convergence: 127
Scaled residuals:
Min 1Q Median 3Q Max
-2.15599 -0.64718 0.09981 0.63382 1.67447
Random effects:
Groups Name Variance Std.Dev.
Team (Intercept) 1.091 1.044
Residual 2.180 1.476
Number of obs: 36, groups: Team, 9
Fixed effects:
Estimate Std. Error t value
(Intercept) 23.5889 0.4922 47.922
Physique.L -0.9664 0.8526 -1.133
Physique.Q 0.1905 0.8526 0.223
PhysiqueSlight:MethodM2 -8.4500 0.8524 -9.913
PhysiqueAverage:MethodM2 -8.1000 0.8524 -9.503
PhysiqueHeavy:MethodM2 -8.9833 0.8524 -10.539
Correlation of Fixed Effects:
(Intr) Phys.L Phys.Q PS:MM2 PA:MM2
Physique.L 0.000
Physique.Q 0.000 0.000
PhysqSl:MM2 -0.289 0.353 -0.204
PhysqAv:MM2 -0.289 0.000 0.408 0.000
PhysqHv:MM2 -0.289 -0.353 -0.204 0.000 0.000
A good way to proceed is to determine the fitted values, and present these in an interaction plot:
<- fitted(Gun.lmer)
Gun.hat interaction.plot(Gun$Physique, Gun$Method, Gun.hat)
Differences between methods, for each of the three physiques, are strongly attested. These can be estimated within teams, allowing 24 degrees of freedom for each of these comparisons.
Clear patterns of change with Physique
seem apparent in the plot. There are however too few degrees of freedom for this effect to appear statistically significant. Note however that the parameters that are given are for the lowest level of Method
, i.e., for M1
. Making M2
the baseline shows the effect as closer to the conventional 5% significance level.
The component of variance at the between teams level is of the same order of magnitude as the within teams component. Its contribution to the variance of team means (1.044\(^2\)) is much greater than the contribution of the within team component (1.476\(^2\)/4; there are 4 results per team). If comparison between physiques is the concern; it will be much more effective to double the number of teams; compare (1.044\(^2\)+1.476\(^2\)/4)/2 (=0.82) with 1.044\(^2\)+1.476\(^2\)/8 (=1.36).
Exercise 4
*The data set ergoStool
(MEMSS package) has data on the amount of effort needed to get up from a stool, for each of nine individuals who each tried four different types of stool. Analyse the data both using aov()
and using lme()
, and reconcile the two sets of output. Was there any clear winner among the types of stool, if the aim is to keep effort to a minimum?
For analysis of variance, specify
aov(effort~Type+Error(Subject), data=ergoStool)
Call:
aov(formula = effort ~ Type + Error(Subject), data = ergoStool)
Grand Mean: 10.25
Stratum 1: Subject
Terms:
Residuals
Sum of Squares 66.5
Deg. of Freedom 8
Residual standard error: 2.883141
Stratum 2: Within
Terms:
Type Residuals
Sum of Squares 81.19444 29.05556
Deg. of Freedom 3 24
Residual standard error: 1.100295
Estimated effects may be unbalanced
For testing the Type
effect for statistical significance, refer (81.19/3)/(29.06/24) (=22.35) with the \(F_{3,24}\) distribution. The effect is highly significant.
This is about as far as it is possible to go with analysis of variance calculations. When Error()
is specified in the aov
model, R has no mechanism for extracting estimates. (There are mildly tortuous ways to extract the information, which will not be further discussed here.)
For use of lmer
, specify
summary(lmer(effort~Type + (1|Subject), data=ergoStool))
Linear mixed model fit by REML ['lmerMod']
Formula: effort ~ Type + (1 | Subject)
Data: ergoStool
REML criterion at convergence: 121.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.80200 -0.64317 0.05783 0.70100 1.63142
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 1.775 1.332
Residual 1.211 1.100
Number of obs: 36, groups: Subject, 9
Fixed effects:
Estimate Std. Error t value
(Intercept) 8.5556 0.5760 14.853
TypeT2 3.8889 0.5187 7.498
TypeT3 2.2222 0.5187 4.284
TypeT4 0.6667 0.5187 1.285
Correlation of Fixed Effects:
(Intr) TypeT2 TypeT3
TypeT2 -0.450
TypeT3 -0.450 0.500
TypeT4 -0.450 0.500 0.500
Observe that 1.100295\(^2\) (Residual StdDev) is very nearly equal to 29.06/24 obtained from the analysis of variance calculation.
Also the Stratum 1 mean square of 66.5/8 (=8.3125) from the analysis of variance output is very nearly equal to 1.3325\(^2\) +1.100295\(^2\)/4 (= 2.078) from the lme
output.
Exercise 5
*In the data set MathAchieve
(MEMSS package), the factors Minority
(levels yes
and no
) and sex
, and the variable SES
(socio-economic status) are clearly fixed effects. Discuss how the decision whether to treat School
as a fixed or as a random effect might depend on the purpose of the study? Carry out an analysis that treats School
as a random effect. Are differences between schools greater than can be explained by within school variation?
School should be treated as a random effect if the intention is to generalize results to other comparable schools. If the intention is to apply them to other pupils or classess within those same schools, it should be taken as a fixed effect.
For the analysis of these data, both SES
and MEANSES
should be included in the model. Then the coefficient of MEANSES
will measure between school effects, while the coefficient of SES
will measure within school effects.
<- lmer(MathAch ~ Minority*Sex*(MEANSES+SES) + (1|School),
MathAch.lmer data=MEMSS::MathAchieve)
MathAch.lmer
Linear mixed model fit by REML ['lmerMod']
Formula: MathAch ~ Minority * Sex * (MEANSES + SES) + (1 | School)
Data: MEMSS::MathAchieve
REML criterion at convergence: 46316.33
Random effects:
Groups Name Std.Dev.
School (Intercept) 1.585
Residual 5.982
Number of obs: 7185, groups: School, 160
Fixed Effects:
(Intercept) MinorityYes
12.7993 -2.6055
SexMale MEANSES
1.2772 2.2365
SES MinorityYes:SexMale
2.5085 -0.4623
MinorityYes:MEANSES MinorityYes:SES
1.4387 -1.1007
SexMale:MEANSES SexMale:SES
0.5740 -0.5166
MinorityYes:SexMale:MEANSES MinorityYes:SexMale:SES
-0.7132 0.1103
To get the square roots of variance component estimates, specify:
<- profile(MathAch.lmer, n=10000)
MathAch.prof confint(MathAch.prof, parm=1:2, level=0.95) ## Standard deviations
2.5 % 97.5 %
.sig01 1.347259 1.821391
.sigma 5.880940 6.078772
## The following returns the variances:
apply(confint(MathAch.prof, parm=1:2, level=0.95), 2, function(x)x^2) |>
round(2)
2.5 % 97.5 %
.sig01 1.82 3.32
.sigma 34.59 36.95
The 95% confidence interval for the between school component of variance extended, in my calculation, from 1.82 to 3.32. See lme4::pvalues
for other possible ways to calculate confidence intervals.
If the argument parm
is left unspecified, confidence intervals are returned for all parameter estimates.
The number of results for school varies between 14 and 67. Thus, the relative contribution to class means is 5.51 and a number that is at most 5.982429\(^2\)/14 = 2.56.