> library(NCStats) # Summarize, view
> library(car) # Anova
> options(contrasts=c("contr.sum","contr.poly"))
> library(NCStats) # Summarize, view
> library(car) # Anova
> options(contrasts=c("contr.sum","contr.poly"))
The working directory is set and the data file (i.e., box5_2.txt) is read with,
> setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box5_2")
> d <- read.table("box5_2.txt", header = TRUE)
The structure of the data frame and the first six rows of the data frame are observed with,
> str(d)
'data.frame': 416 obs. of 8 variables:
$ ID : Factor w/ 63 levels "07447","200404111",..: 1 1 1 3 3 3 3 4 4 4 ...
$ sex : Factor w/ 2 levels "F","M": 2 2 2 1 1 1 1 1 1 1 ...
$ Lc : int 336 336 336 395 395 395 395 386 386 386 ...
$ date: int 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
$ age : int 3 3 3 4 4 4 4 4 4 4 ...
$ Sc : num 16.3 16.3 16.3 18.4 18.4 18.4 18.4 18.6 18.6 18.6 ...
$ inc : int 1 2 3 1 2 3 4 1 2 3 ...
$ Si : num 5 12.9 16.3 4.8 9.9 16.5 18.4 4.9 8.5 13.6 ...
> head(d)
ID sex Lc date age Sc inc Si
1 07447 M 336 2004 3 16.3 1 5.0
2 07447 M 336 2004 3 16.3 2 12.9
3 07447 M 336 2004 3 16.3 3 16.3
4 35334 F 395 2004 4 18.4 1 4.8
5 35334 F 395 2004 4 18.4 2 9.9
6 35334 F 395 2004 4 18.4 3 16.5
It is very important to note that this data file is already in what is called one-radii-per-line format. This is noted by the fact that the first three lines of the data file all pertain to fish number 07447. Note that all information in those three rows is the same except in the column labelled Si which is the scale radius at age i. It is common for the data to be collected in what is called one-fish-per-line format where all information for one fish is found in one row of the data file. The data must be in one-radii-per-line format to efficiently back-calculate length-at-age. Methods of converting from one format to the other format are described here.
|
The data must be in one-radii-per-line format for back-calculation methods. |
Finally, for some back-calculation methods (e.g., Dahl-Lea) it is not important that the measurements made on the fish and those made on the scales are in the same units. However, a common set of units is required for other methods (e.g., Fraser-Lee). The scale and fish measurements should be converted to common units to guard against forgetting to do this later. The authors note that the fish measurements in this example are mm whereas the scale measurements are in cm at a magnificant of 24X. Thus, the scale measurements are converted to actual mm (and stored in a new veriable) with,
> d$Si2 <- d$Si*10/24 # convert radial measurements
> d$Sc2 <- d$Sc*10/24 # convert total scale radius
> head(d)
ID sex Lc date age Sc inc Si Si2 Sc2
1 07447 M 336 2004 3 16.3 1 5.0 2.083333 6.791667
2 07447 M 336 2004 3 16.3 2 12.9 5.375000 6.791667
3 07447 M 336 2004 3 16.3 3 16.3 6.791667 6.791667
4 35334 F 395 2004 4 18.4 1 4.8 2.000000 7.666667
5 35334 F 395 2004 4 18.4 2 9.9 4.125000 7.666667
6 35334 F 395 2004 4 18.4 3 16.5 6.875000 7.666667
|
In general, the scale and length measurements should be in the same units (i.e., the same scale). |
The Dahl-Lea method of back-calculating fish length at previous age can be accomplished very simply with (note the use of the modified scale measurement variables),
> d$Li1 <- d$Lc * d$Si2/d$Sc2
> head(d)
ID sex Lc date age Sc inc Si Si2 Sc2 Li1
1 07447 M 336 2004 3 16.3 1 5.0 2.083333 6.791667 103.0675
2 07447 M 336 2004 3 16.3 2 12.9 5.375000 6.791667 265.9141
3 07447 M 336 2004 3 16.3 3 16.3 6.791667 6.791667 336.0000
4 35334 F 395 2004 4 18.4 1 4.8 2.000000 7.666667 103.0435
5 35334 F 395 2004 4 18.4 2 9.9 4.125000 7.666667 212.5272
6 35334 F 395 2004 4 18.4 3 16.5 6.875000 7.666667 354.2120
Summary statistics (similar to the Table shown on page 206 of Box 5.2) can be computed with,
> Summarize(Li1 ~ inc, data = d)
n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max.
1 65 93.21687 19.27293 63.61 77.86 87.16 106.5 152.9
2 65 217.27094 40.52307 141.10 188.90 215.20 243.0 339.8
3 65 328.93495 45.24603 230.80 296.30 335.00 360.4 412.9
4 52 373.76110 44.18784 258.70 339.20 387.00 410.1 437.0
5 35 388.88135 43.20384 295.70 357.90 391.70 430.0 446.0
6 27 404.21979 38.13072 323.40 379.50 406.00 440.9 459.0
7 21 413.52603 30.17742 350.80 400.10 421.80 431.3 463.0
8 18 430.09080 27.60281 372.80 414.00 440.30 448.9 473.0
9 18 452.11693 28.21794 397.10 437.10 457.20 467.0 495.9
10 14 464.07799 28.00553 408.10 455.80 468.40 473.9 508.7
11 11 478.12451 30.49704 423.60 461.90 483.60 486.3 526.0
12 9 488.78812 26.28516 441.20 471.50 496.00 500.2 529.0
13 6 498.42473 24.85923 467.70 481.10 496.40 517.2 530.0
14 4 501.28996 20.35016 489.70 489.80 491.90 503.4 531.7
15 3 516.92132 29.52618 499.00 499.90 500.80 525.9 551.0
16 2 540.95833 38.12484 514.00 527.50 541.00 554.4 567.9
17 1 580.00000 NA 580.00 580.00 580.00 580.0 580.0
The regression of length at age i on scale radius at age i as described in Box 5.2 is conducted with,
> lm1 <- lm(Li1 ~ Si2, data = d)
> anova(lm1)
Analysis of Variance Table
Response: Li1
Df Sum Sq Mean Sq F value Pr(>F)
Si2 1 6486641 6486641 3672.4 < 2.2e-16
Residuals 414 731254 1766
> summary(lm1)
Call:
lm(formula = Li1 ~ Si2, data = d)
Residuals:
Min 1Q Median 3Q Max
-108.929 -23.408 -2.516 24.922 127.170
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.4914 5.1755 6.858 2.56e-11
Si2 50.8834 0.8397 60.600 < 2e-16
Residual standard error: 42.03 on 414 degrees of freedom
Multiple R-squared: 0.8987, Adjusted R-squared: 0.8984
F-statistic: 3672 on 1 and 414 DF, p-value: < 2.2e-16
The intercept from this model (35.49) is an estimate the correction factor required for the Fraser-Lee method. The back-calculed length-at-age using the Fraser-Lee method is then constructed with,
> a <- coef(lm1)[1] # get the correction factor (first coefficient in lm1)
> d$Li2 <- a + (d$Lc-a)*(d$Si2/d$Sc2)
> head(d)
ID sex Lc date age Sc inc Si Si2 Sc2 Li1 Li2
1 07447 M 336 2004 3 16.3 1 5.0 2.083333 6.791667 103.0675 127.6719
2 07447 M 336 2004 3 16.3 2 12.9 5.375000 6.791667 265.9141 273.3172
3 07447 M 336 2004 3 16.3 3 16.3 6.791667 6.791667 336.0000 336.0000
4 35334 F 395 2004 4 18.4 1 4.8 2.000000 7.666667 103.0435 129.2762
5 35334 F 395 2004 4 18.4 2 9.9 4.125000 7.666667 212.5272 228.9226
6 35334 F 395 2004 4 18.4 3 16.5 6.875000 7.666667 354.2120 357.8768
Summary statistics (similar to the Table shown on page 209 of Box 5.2) can be computed with,
> Summarize(Li2 ~ inc, data = d)
n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max.
1 65 120.8734 17.29790 94.4 107.0 115.8 132.0 175.4
2 65 234.5123 36.27838 165.7 209.5 234.0 259.3 346.5
3 65 336.7997 39.87565 250.1 309.9 340.5 363.8 414.6
4 52 379.4626 38.91109 275.8 350.6 391.0 410.2 437.0
5 35 394.8710 38.44757 310.2 366.4 395.7 430.0 446.0
6 27 409.7779 34.16028 335.9 388.3 413.9 442.6 459.0
7 21 419.0092 27.23395 362.0 406.9 425.2 433.7 463.0
8 18 434.8316 25.29983 382.6 422.4 444.6 450.6 476.8
9 18 455.2892 25.95792 405.2 442.8 459.2 467.0 498.2
10 14 466.9554 26.23349 415.4 459.4 469.4 476.5 510.0
11 11 480.5609 29.01173 429.8 464.2 485.5 489.7 526.0
12 9 490.6470 25.16001 446.2 473.2 496.0 505.1 529.0
13 6 499.9276 24.45414 470.9 482.3 496.8 520.0 530.0
14 4 502.6126 21.39454 490.4 491.2 492.7 504.2 534.6
15 3 517.8175 30.30332 499.0 500.3 501.7 527.2 552.8
16 2 541.3280 38.64768 514.0 527.7 541.3 555.0 568.7
17 1 580.0000 NA 580.0 580.0 580.0 580.0 580.0
In Box 5.3 the author’s use a second-degree polynomial model to determine if "growth" (i.e., back-calculated length-at-age) differed between males and females.
> d$incSqrd <- d$inc^2
> lm2 <- lm(Li2 ~ sex + inc + incSqrd + inc*sex + incSqrd*sex,data=d)
> anova(lm2) # type-I SS
Analysis of Variance Table
Response: Li2
Df Sum Sq Mean Sq F value Pr(>F)
sex 1 116872 116872 62.7762 2.205e-14
inc 1 4287009 4287009 2302.7123 < 2.2e-16
incSqrd 1 937827 937827 503.7418 < 2.2e-16
sex:inc 1 9743 9743 5.2334 0.02267
sex:incSqrd 1 134986 134986 72.5061 3.189e-16
Residuals 410 763306 1862
> Anova(lm2,type="III") # type-III SS
Anova Table (Type III tests)
Response: Li2
Sum Sq Df F value Pr(>F)
(Intercept) 244395 1 131.274 < 2.2e-16
sex 49899 1 26.803 3.537e-07
inc 1974817 1 1060.748 < 2.2e-16
incSqrd 688101 1 369.605 < 2.2e-16
sex:inc 143893 1 77.290 < 2.2e-16
sex:incSqrd 134986 1 72.506 3.189e-16
Residuals 763306 410
It is interesting to look at a plot of the results — even though it is a bit cumbersome to construct this plot. Here it is,
> plot(Li2~inc,data=d,col="white",xlab="Age",ylab="Back-Calculated Length") # schematic
> points(Li2~I(inc-0.1),col="red",pch=16,data=Subset(d,sex=="M")) # add points for Males
> points(Li2~I(inc+0.1),col="black",pch=16,data=Subset(d,sex=="F")) # ... and females
> incs <- seq(1,17,0.1) # create a sequence of increment numbers
> predM <- predict(lm2,data.frame(inc=incs,incSqrd=incs^2,sex=rep("M",length(incs)))) # predicted Age for males
> predF <- predict(lm2,data.frame(inc=incs,incSqrd=incs^2,sex=rep("F",length(incs)))) # ... and females
> lines(predM~incs,col="red",lwd=2) # plot for males
> lines(predF~incs,col="black",lwd=2) # ... and females
From this it is immediately apparent that the choice of a quadratic model was a poor choice. I would suggest using a von Bertalanffy model instead (see Box 5.4 Vignette or, for comparing two groups, my Von Bertalanffy vignette).