// library(ascii) // setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_13") // Sweave("Box3_13a.Rnw",driver=RweaveAsciidoc) AIFFD Box 3-13 Vignette ======================== :Author: Derek H. Ogle :Email: dogle@northland.edu :Date: 17-June-2009 :Revision: 2 == Required Packages and Setting Options ---- > library(car) # Anova > library(pda) # lsmean > library(NCStats) # fit.plot > options(contrasts=c("contr.sum","contr.poly")) ---- == Preparing Data The working directory is set, the link:box3_13.txt[] data file is read, and the structure of the data frame is observed with, ---- > setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_13") > d <- read.table("box3_13.txt", header = TRUE) > str(d) 'data.frame': 16 obs. of 4 variables: $ year : int 1987 1987 1987 1987 1988 1988 1988 1988 1989 1989 ... $ size : Factor w/ 2 levels "Sub","Yearling": 1 1 2 2 1 1 2 2 1 1 ... $ release : Factor w/ 2 levels "Off","On": 2 1 2 1 2 1 2 1 2 1 ... $ survival: num 0.058 0.155 0.406 0.319 0.058 ... ---- As noted in Box 3.13, the __survival__ variable should be transformed because values expressed as proportions (or percentages) rarely meet the normality or equal variances assumptions of ANOVA. The typical transformations for variables expressed as proportions is to use the arcsine (i.e., inverse sine) square-root function. The authors of Box 3.13 used only an arcsine transformation. In this vignette, I will use the arcsine transformation to demonstrate equivalence of R and SAS output and then use the arcsine square root transformation to demonstrate what I see as the "proper" methodology. The __survival__ variable is transformed to the arcsine of survival (and called __arcsurv__) with, ---- > d$arcsurv <- asin(d$survival/100) ---- and to the aresine square root of survival (and called __arcsrsurv__) with, ---- > d$arcsrsurv <- asin(sqrt(d$survival/100)) ---- == ANOVA Results I The ANOVA, using the arcsine transformed survival variable, is fit with +*[red]#lm()#*+ (for "linear model") where the explanatory variable side (right-hand-side) of the linear model formula uses a notation where the two explanatory variables appear to be multiplied together. In R, this apparent multiplication notation is a short-hand method of telling R to include each main effect and the interaction between the two explanatory variables. In the example below, the +*[red]#size*release#*+ portion of the formula is equivalent to saying +*[red]#size + release + size:release#*+. The model using this short-hand notation is fit with, ---- > lm1 <- lm(arcsurv ~ size * release, data = d) ---- The type-III SS are obtained with (see discussion about SS in the link:../preliminaries/preliminaries.html[preliminaries vignette]), ---- > Anova(lm1, type = "III") Anova Table (Type III tests) Response: arcsurv Sum Sq Df F value Pr(>F) (Intercept) 0.00029744 1 6.9597 0.02165 size 0.00023428 1 5.4820 0.03729 release 0.00008329 1 1.9489 0.18800 size:release 0.00008021 1 1.8770 0.19577 Residuals 0.00051284 12 ---- The least-squares means are computed with +*[red]#lsmean()#*+ from the +*pda*+ package (see discussion about least-squares means in the link:../preliminaries/preliminaries.html[preliminaries vignette]). As there are two factors in this model, R must be told to compute the least-squares means separately by including the factor variable names separately in the +*[red]#factors=#*+ argument as such, ---- > lsmean(lm1, factors = "size") size pred se Sub Sub 0.0004850001 0.002311295 Yearling Yearling 0.0081381526 0.002311295 > lsmean(lm1, factors = "release") release pred se Off Off 0.006593146 0.002311295 On On 0.002030007 0.002311295 ---- In addition, if you ignore the +*[red]#factors=#*+ argument then the least-squares means for all possible groups will be computed. For example, ---- > lsmean(lm1) size release pred se Sub.Off Sub Off 0.0005275002 0.003268665 Sub.On Yearling Off 0.0004425000 0.003268665 Yearling.Off Sub On 0.0126587916 0.003268665 Yearling.On Yearling On 0.0036175136 0.003268665 ---- == ANOVA Results II The ANOVA, using the arcsine transformed survival variable, is fit with ---- > lm2 <- lm(arcsrsurv ~ size * release, data = d) > Anova(lm2, type = "III") Anova Table (Type III tests) Response: arcsrsurv Sum Sq Df F value Pr(>F) (Intercept) 0.039426 1 40.3196 3.666e-05 size 0.014377 1 14.7029 0.002376 release 0.001792 1 1.8329 0.200730 size:release 0.002058 1 2.1046 0.172491 Residuals 0.011734 12 ---- There does not appear to be an interaction between the two factors (p=0.1725); thus, the main effects can be interpreted. There does not appear to be a difference in the mean arcsine square root of survival between the two release sites (p=0.2007). However, the mean arcsine square root of survival does appear to differ signficantly between the two sizes of released fish (p=0.0024). These are the same qualitative results as was obtained with the arcsine transformation used in Box 3.13. The +*[red]#fit.plot()#*+ function from the +*NCStats*+ package can be used to visualize the simultaneous effects of the two factors on arcsine square root of survival. This so-called interaction plot can be constructed with, ---- > fit.plot(lm2, xlab = "Release Size", ylab = "Arcsine Square Root of Survival", legend = "topleft", main = "") ---- image::Box3_13a-012.png[] However, because the interaction was insignificant, you can also look at the main-effect means plots using +*[red]#fit.plot()#*+ as follows, ---- > par(mfrow=c(1,2)) # make a one row, two column "grid" for the plots > fit.plot(lm2,which="size",xlab="Release Size",ylab="Arcsine Square Root of Survival",legend="topleft",main="",ylim=c(0.01,0.12)) > fit.plot(lm2,which="release",xlab="Release Site",ylab="Arcsine Square Root of Survival",legend="topleft",main="",ylim=c(0.01,0.12)) ---- image::Box3_13a-013.png[] Note that the +*[red]#ylim=#*+ argument was used to control the range of y-axis so as to allow for a better comparison of the relative effects of the two main effects. == ANOVA Results III The same model fits summarized with type-II SS (see discussion about SS in the link:../preliminaries/preliminaries.html[preliminaries vignette]) are obtained with, ---- > Anova(lm1, type = "II") Anova Table (Type II tests) Response: arcsurv Sum Sq Df F value Pr(>F) size 0.00023428 1 5.4820 0.03729 release 0.00008329 1 1.9489 0.18800 size:release 0.00008021 1 1.8770 0.19577 Residuals 0.00051284 12 > Anova(lm2, type = "II") Anova Table (Type II tests) Response: arcsrsurv Sum Sq Df F value Pr(>F) size 0.0143773 1 14.7029 0.002376 release 0.0017923 1 1.8329 0.200730 size:release 0.0020580 1 2.1046 0.172491 Residuals 0.0117342 12 ---- The conclusions from these summaries are identical to the conclusions from above.