Required Packages and Setting Options

> library(car)      # Anova
> library(pda)      # lsmean
> library(NCStats)  # fit.plot
> options(contrasts=c("contr.sum","contr.poly"))

Preparing Data

The working directory is set, the box3_13.txt data file is read, and the structure of the data frame is observed with,

> setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_13")
> d <- read.table("box3_13.txt", header = TRUE)
> str(d)
'data.frame':   16 obs. of  4 variables:
 $ year    : int  1987 1987 1987 1987 1988 1988 1988 1988 1989 1989 ...
 $ size    : Factor w/ 2 levels "Sub","Yearling": 1 1 2 2 1 1 2 2 1 1 ...
 $ release : Factor w/ 2 levels "Off","On": 2 1 2 1 2 1 2 1 2 1 ...
 $ survival: num  0.058 0.155 0.406 0.319 0.058 ...

As noted in Box 3.13, the survival variable should be transformed because values expressed as proportions (or percentages) rarely meet the normality or equal variances assumptions of ANOVA. The typical transformations for variables expressed as proportions is to use the arcsine (i.e., inverse sine) square-root function. The authors of Box 3.13 used only an arcsine transformation. In this vignette, I will use the arcsine transformation to demonstrate equivalence of R and SAS output and then use the arcsine square root transformation to demonstrate what I see as the "proper" methodology. The survival variable is transformed to the arcsine of survival (and called arcsurv) with,

> d$arcsurv <- asin(d$survival/100)

and to the aresine square root of survival (and called arcsrsurv) with,

> d$arcsrsurv <- asin(sqrt(d$survival/100))

ANOVA Results I

The ANOVA, using the arcsine transformed survival variable, is fit with lm() (see linear models vignette for an introduction) and results are assigned to an object with,

> lm1 <- lm(arcsurv ~ size * release, data = d)
Important In R, the apparent multiplication of two factor variables in a linear model formula is a short-hand notation to tell R to include each factor as main effects AND the interaction between the two factors. R denotes the interaction term by separating the main factors with a colon.

The type-III SS (see discussion about SS in the preliminaries vignette) are obtained by submitting the lm object to Anova() with the type="III" argument as follows,

> Anova(lm1, type = "III")
Anova Table (Type III tests)

Response: arcsurv
                 Sum Sq Df F value  Pr(>F)
(Intercept)  0.00029744  1  6.9597 0.02165
size         0.00023428  1  5.4820 0.03729
release      0.00008329  1  1.9489 0.18800
size:release 0.00008021  1  1.8770 0.19577
Residuals    0.00051284 12

The least-squares means (see discussion about least-squares means in the preliminaries vignette) are computed by submitting the lm object to lsmean() from the pda package. As there are two factors in this model, R must be told to compute the least-squares means separately by including the factor variable names separately in the factors= argument as such,

> lsmean(lm1, factors = "size")
             size         pred          se
Sub           Sub 0.0004850001 0.002311295
Yearling Yearling 0.0081381526 0.002311295
> lsmean(lm1, factors = "release")
    release        pred          se
Off     Off 0.006593146 0.002311295
On       On 0.002030007 0.002311295

In addition, if you ignore the factors= argument then the least-squares means for all possible groups will be computed. For example,

> lsmean(lm1)
                 size release         pred          se
Sub.Off           Sub     Off 0.0005275002 0.003268665
Sub.On            Sub      On 0.0004425000 0.003268665
Yearling.Off Yearling     Off 0.0126587916 0.003268665
Yearling.On  Yearling      On 0.0036175136 0.003268665

ANOVA Results II

The ANOVA, using the arcsine transformed survival variable, is fit with

> lm2 <- lm(arcsrsurv ~ size * release, data = d)
> Anova(lm2, type = "III")
Anova Table (Type III tests)

Response: arcsrsurv
               Sum Sq Df F value    Pr(>F)
(Intercept)  0.039426  1 40.3196 3.666e-05
size         0.014377  1 14.7029  0.002376
release      0.001792  1  1.8329  0.200730
size:release 0.002058  1  2.1046  0.172491
Residuals    0.011734 12

There does not appear to be an interaction between the two factors (p=0.1725); thus, the main effects can be interpreted. There does not appear to be a difference in the mean arcsine square root of survival between the two release sites (p=0.2007). However, the mean arcsine square root of survival does appear to differ signficantly between the two sizes of released fish (p=0.0024). These are the same qualitative results as was obtained with the arcsine transformation used in Box 3.13 of the AIFFD book.

The fit.plot() function from the NCStats package can be used to visualize the simultaneous effects of the two factors on arcsine square root of survival. This so-called interaction plot can be constructed with,

> fit.plot(lm2, xlab = "Release Size", ylab = "Arcsine Square Root of Survival", legend = "topleft", main = "")
Box3_13a-012.png

However, because the interaction was insignificant, you can also look at the main-effect means plots using fit.plot() with the which= argument as follows,

> par(mfrow=c(1,2))      # make a one row, two column "grid" for the plots
> fit.plot(lm2,which="size",xlab="Release Size",ylab="Arcsine Square Root of Survival",legend="topleft",main="",ylim=c(0.01,0.12))
> fit.plot(lm2,which="release",xlab="Release Site",ylab="Arcsine Square Root of Survival",legend="topleft",main="",ylim=c(0.01,0.12))
Box3_13a-013.png

Note that the ylim= argument was used to control the range of y-axis so as to allow for a better comparison of the relative effects of the two main effects.

ANOVA Results III

The same model fits summarized with type-II SS (see discussion about SS in the preliminaries vignette) are obtained with,

> Anova(lm1, type = "II")
Anova Table (Type II tests)

Response: arcsurv
                 Sum Sq Df F value  Pr(>F)
size         0.00023428  1  5.4820 0.03729
release      0.00008329  1  1.9489 0.18800
size:release 0.00008021  1  1.8770 0.19577
Residuals    0.00051284 12
> Anova(lm2, type = "II")
Anova Table (Type II tests)

Response: arcsrsurv
                Sum Sq Df F value   Pr(>F)
size         0.0143773  1 14.7029 0.002376
release      0.0017923  1  1.8329 0.200730
size:release 0.0020580  1  2.1046 0.172491
Residuals    0.0117342 12

The conclusions from these summaries are identical to the conclusions from above.