// library(ascii) // setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_13") // Asciidoc("Box3_13a.Rnw") AIFFD Box 3-13 Vignette ======================== :Author: Derek H. Ogle :Email: dogle@northland.edu :Date: 10-June-2010 :Revision: 2 == Required Packages and Setting Options ---- > library(car) # Anova > library(pda) # lsmean > library(NCStats) # fit.plot > options(contrasts=c("contr.sum","contr.poly")) ---- == Preparing Data The working directory is set, the link:box3_13.txt[] data file is read, and the structure of the data frame is observed with, ---- > setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_13") > d <- read.table("box3_13.txt", header = TRUE) > str(d) 'data.frame': 16 obs. of 4 variables: $ year : int 1987 1987 1987 1987 1988 1988 1988 1988 1989 1989 ... $ size : Factor w/ 2 levels "Sub","Yearling": 1 1 2 2 1 1 2 2 1 1 ... $ release : Factor w/ 2 levels "Off","On": 2 1 2 1 2 1 2 1 2 1 ... $ survival: num 0.058 0.155 0.406 0.319 0.058 ... ---- As noted in Box 3.13, the +*survival*+ variable should be transformed because values expressed as proportions (or percentages) rarely meet the normality or equal variances assumptions of ANOVA. The typical transformations for variables expressed as proportions is to use the arcsine (i.e., inverse sine) square-root function. The authors of Box 3.13 used only an arcsine transformation. In this vignette, I will use the arcsine transformation to demonstrate equivalence of R and SAS output and then use the arcsine square root transformation to demonstrate what I see as the "proper" methodology. The +*survival*+ variable is transformed to the arcsine of survival (and called +*arcsurv*+) with, ---- > d$arcsurv <- asin(d$survival/100) ---- and to the aresine square root of survival (and called +*arcsrsurv*+) with, ---- > d$arcsrsurv <- asin(sqrt(d$survival/100)) ---- == ANOVA Results I The ANOVA, using the arcsine transformed survival variable, is fit with +*[red]#lm()#*+ (see link:../preliminaries/LinearModels.html[linear models vignette] for an introduction) and results are assigned to an object with, ---- > lm1 <- lm(arcsurv ~ size * release, data = d) ---- IMPORTANT: In R, the apparent multiplication of two factor variables in a linear model formula is a short-hand notation to tell R to include each factor as main effects AND the interaction between the two factors. R denotes the interaction term by separating the main factors with a colon. The type-III SS (see discussion about SS in the link:../preliminaries/preliminaries.html[preliminaries vignette]) are obtained by submitting the +*lm*+ object to +*[red]#Anova()#*+ with the +*[red]#type="III"#*+ argument as follows, ---- > Anova(lm1, type = "III") Anova Table (Type III tests) Response: arcsurv Sum Sq Df F value Pr(>F) (Intercept) 0.00029744 1 6.9597 0.02165 size 0.00023428 1 5.4820 0.03729 release 0.00008329 1 1.9489 0.18800 size:release 0.00008021 1 1.8770 0.19577 Residuals 0.00051284 12 ---- The least-squares means (see discussion about least-squares means in the link:../preliminaries/preliminaries.html[preliminaries vignette]) are computed by submitting the +*lm*+ object to +*[red]#lsmean()#*+ from the +*pda*+ package. As there are two factors in this model, R must be told to compute the least-squares means separately by including the factor variable names separately in the +*[red]#factors=#*+ argument as such, ---- > lsmean(lm1, factors = "size") size pred se Sub Sub 0.0004850001 0.002311295 Yearling Yearling 0.0081381526 0.002311295 > lsmean(lm1, factors = "release") release pred se Off Off 0.006593146 0.002311295 On On 0.002030007 0.002311295 ---- In addition, if you ignore the +*[red]#factors=#*+ argument then the least-squares means for all possible groups will be computed. For example, ---- > lsmean(lm1) size release pred se Sub.Off Sub Off 0.0005275002 0.003268665 Sub.On Sub On 0.0004425000 0.003268665 Yearling.Off Yearling Off 0.0126587916 0.003268665 Yearling.On Yearling On 0.0036175136 0.003268665 ---- == ANOVA Results II The ANOVA, using the arcsine transformed survival variable, is fit with ---- > lm2 <- lm(arcsrsurv ~ size * release, data = d) > Anova(lm2, type = "III") Anova Table (Type III tests) Response: arcsrsurv Sum Sq Df F value Pr(>F) (Intercept) 0.039426 1 40.3196 3.666e-05 size 0.014377 1 14.7029 0.002376 release 0.001792 1 1.8329 0.200730 size:release 0.002058 1 2.1046 0.172491 Residuals 0.011734 12 ---- There does not appear to be an interaction between the two factors (p=0.1725); thus, the main effects can be interpreted. There does not appear to be a difference in the mean arcsine square root of survival between the two release sites (p=0.2007). However, the mean arcsine square root of survival does appear to differ signficantly between the two sizes of released fish (p=0.0024). These are the same qualitative results as was obtained with the arcsine transformation used in Box 3.13 of the AIFFD book. The +*[red]#fit.plot()#*+ function from the +*NCStats*+ package can be used to visualize the simultaneous effects of the two factors on arcsine square root of survival. This so-called interaction plot can be constructed with, ---- > fit.plot(lm2, xlab = "Release Size", ylab = "Arcsine Square Root of Survival", legend = "topleft", main = "") ---- image::Box3_13a-012.png[] However, because the interaction was insignificant, you can also look at the main-effect means plots using +*[red]#fit.plot()#*+ with the +*[red]#which=#*+ argument as follows, ---- > par(mfrow=c(1,2)) # make a one row, two column "grid" for the plots > fit.plot(lm2,which="size",xlab="Release Size",ylab="Arcsine Square Root of Survival",legend="topleft",main="",ylim=c(0.01,0.12)) > fit.plot(lm2,which="release",xlab="Release Site",ylab="Arcsine Square Root of Survival",legend="topleft",main="",ylim=c(0.01,0.12)) ---- image::Box3_13a-013.png[] Note that the +*[red]#ylim=#*+ argument was used to control the range of y-axis so as to allow for a better comparison of the relative effects of the two main effects. == ANOVA Results III The same model fits summarized with type-II SS (see discussion about SS in the link:../preliminaries/preliminaries.html[preliminaries vignette]) are obtained with, ---- > Anova(lm1, type = "II") Anova Table (Type II tests) Response: arcsurv Sum Sq Df F value Pr(>F) size 0.00023428 1 5.4820 0.03729 release 0.00008329 1 1.9489 0.18800 size:release 0.00008021 1 1.8770 0.19577 Residuals 0.00051284 12 > Anova(lm2, type = "II") Anova Table (Type II tests) Response: arcsrsurv Sum Sq Df F value Pr(>F) size 0.0143773 1 14.7029 0.002376 release 0.0017923 1 1.8329 0.200730 size:release 0.0020580 1 2.1046 0.172491 Residuals 0.0117342 12 ---- The conclusions from these summaries are identical to the conclusions from above.