> library(car) # Anova
> library(pda) # lsmean
> library(NCStats) # fit.plot
> options(contrasts=c("contr.sum","contr.poly"))
> library(car) # Anova
> library(pda) # lsmean
> library(NCStats) # fit.plot
> options(contrasts=c("contr.sum","contr.poly"))
The working directory is set, the box3_8.txt data file is read (note that this is the same date used in the Box3.8 vignette), and the structure of the data frame is observed with,
> setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_10")
> d <- read.table("box3_8.txt", header = TRUE)
> str(d)
'data.frame': 22 obs. of 4 variables:
$ lake : Factor w/ 22 levels "Bass","Black",..: 21 13 16 1 15 20 14 12 19 10 ...
$ region : Factor w/ 2 levels "North","South": 1 1 1 1 1 1 1 1 1 1 ...
$ bag_limit: int 1 1 1 2 2 2 2 5 5 5 ...
$ catch : num 2.21 2.32 2.74 2.23 2.25 1.4 2.36 1.78 1.64 1.97 ...
The structure of the data frame shows that bag_limit is an "integer" type variable. To properly perform the ANOVA this variable must be converted to a factor variable with,
> d$fbag_limit <- factor(d$bag_limit)
The blocked ANOVA can then be fit using lm() (see linear models vignette for an introduction) and assigned to an object with,
> lm1 <- lm(catch ~ region * fbag_limit, data = d)
|
In R, the apparent multiplication of two factor variables in a linear model formula is a short-hand notation to tell R to include each factor as main effects AND the interaction between the two factors. R denotes the interaction term by separating the main factors with a colon. |
|
I prefer to include the blocking variable first in the analysis as the general idea is to remove the variability associated with this variable. It does not, however, make a difference to the results in a complete block design using Type-III SS. |
The type-III SS (see discussion about SS in the preliminaries vignette) are obtained by submitting the lm1 object to Anova() with the type="III" argument as follows,
> Anova(lm1, type = "III")
Anova Table (Type III tests)
Response: catch
Sum Sq Df F value Pr(>F)
(Intercept) 129.935 1 660.3029 1.948e-14
region 2.862 1 14.5428 0.001529
fbag_limit 1.935 2 4.9171 0.021647
region:fbag_limit 0.439 2 1.1142 0.352339
Residuals 3.148 16
The least-squares means (see discussion about least-squares means in the preliminaries vignette) are computed by submitting the lm object to lsmean() from the pda package. As there are two factors in this model, R must be told to compute the least-squares means separately by including the factor variable names separately in the factors= argument as such,
> lsmean(lm1, factors = "region")
region pred se
North North 2.109444 0.1349830
South South 2.844667 0.1376562
> lsmean(lm1, factors = "fbag_limit")
fbag_limit pred se
1 1 2.736667 0.1810987
2 2 2.620000 0.1694023
5 5 2.074500 0.1487878
The fit.plot() function from the NCStats package can be used to visually observe the mean (with default 95% CIs) catch rates of the different bag limit groups in the different regions. You can use the change.order= argument to change which variable is plotted on the x-axis (you may have to make this plot once before deciding which way you prefer it).
> fit.plot(lm1, change.order = TRUE, xlab = "Walleye Bag Limit", ylab = "Walleye Catch Rate", main = "")
As the interaction term is insignificant, means (with 95% CI) plots for the main effects can also be obtained with fit.plot() using the which= argument as shown below,
> par(mfrow=c(1,2)) # make a one row, two column "grid" for the plots > fit.plot(lm1,which="fbag_limit",xlab="Walleye Bag Limit",ylab="Walleye Catch Rate",main="") > fit.plot(lm1,which="region",xlab="Region",ylab="Walleye Catch Rate",main="")
|
Many authors warn against interpretations between the levels of the blocking variable because the blocks were included in the analysis because there was an a priori idea of a difference between the levels and because the levels were generally not randomized. I have presented the block-levels comparisons here because they were presented in Chapter 3. |
The model fit without the interaction term, followed by the type-III SS ANOVA table, and least-squares means is obtained with,
> lm2 <- lm(catch ~ region + fbag_limit, data = d)
> Anova(lm2, type = "III")
Anova Table (Type III tests)
Response: catch
Sum Sq Df F value Pr(>F)
(Intercept) 129.748 1 651.0897 1.386e-15
region 2.769 1 13.8958 0.001541
fbag_limit 1.957 2 4.9096 0.019877
Residuals 3.587 18
> lsmean(lm2, factors = c("fbag_limit"))
fbag_limit pred se
1 1 2.736667 0.1822443
2 2 2.590978 0.1692787
5 5 2.060350 0.1491815
> lsmean(lm2, factors = c("region"))
region pred se
North North 2.105818 0.1352208
South South 2.819512 0.1366475
These results are nearly identical to those shown in Box 3.10 of the AIFFD book.
The same model fits summarized with type-II SS (see discussion about SS in the preliminaries vignette) are obtained with,
> Anova(lm1, type = "II")
Anova Table (Type II tests)
Response: catch
Sum Sq Df F value Pr(>F)
region 2.76913 1 14.0722 0.001743
fbag_limit 1.95674 2 4.9719 0.020926
region:fbag_limit 0.43852 2 1.1142 0.352339
Residuals 3.14849 16
> Anova(lm2, type = "II")
Anova Table (Type II tests)
Response: catch
Sum Sq Df F value Pr(>F)
region 2.7691 1 13.8958 0.001541
fbag_limit 1.9567 2 4.9096 0.019877
Residuals 3.5870 18
The conclusions from these summaries are identical to the conclusions from above.