> library(NCStats) # Summarize, view, Subset > library(FSA) # lencat, age.key
> library(NCStats) # Summarize, view, Subset > library(FSA) # lencat, age.key
The working directory is set, the age sample data file (i.e., box5_1_aged.txt) and length sample data file (i.e., box5_1_length.txt) are read, and the structure of each data frame is observed with,
> setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box5_1")
> d.age <- read.table("box5_1_aged.txt", header = TRUE)
> str(d.age)
'data.frame': 61 obs. of 3 variables:
$ sex: Factor w/ 2 levels "F","M": 2 2 2 1 1 1 2 2 1 1 ...
$ tl : int 100 111 114 99 104 120 250 255 250 252 ...
$ age: int 1 1 1 1 1 1 2 2 2 2 ...
> d.len <- read.table("box5_1_length.txt", header = TRUE)
> str(d.len)
'data.frame': 416 obs. of 1 variable:
$ tl: int 336 336 336 395 395 395 395 386 386 386 ...
The first step in constructing the age-length key is to create a variable that identifies the length interval category for each fish in the age sample. This variable is constructed, with default name LCat, and appended to the data frame containing the age-sample with lencat() from the FSA package. In this context, lencat() requires four arguments,
d: the data frame containing the age-sample,
cl: a number or string indicating which column of the age sample data frame contains the measured length data,
startcat: a value identifying the starting length measurement category, and
w: a value identifying the width of the length measurement categories.
The lencat() function returns a data frame that consists of the original data frame plus a variable containing the length interval categories for each fish. The default name of the new variable (LCat) can be changed with the vname= argument. The lencat() function result must be assigned to an object, preferably named differently from the original age sample.
It is important when using an age-length key to make sure that lengths in the age-sample span the same range as the lengths in the length- (i.e., unaged) sample. Unfortunately, this is not the case with the spotted sucker data set provided with Box 5.1. Nevertheless, one should find the minimum length in the age-sample with,
> Summarize(d.age$tl, numdigs = 1)
n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max.
61.0 378.1 106.4 99.0 354.0 418.0 444.0 490.0
The length intervals can then start with an even-number 20-mm interval (the authors of Box 5.1 chose to use 20-mm wide intervals) just below the minimum length in the age-sample. In this example, one could start with either 80- or 90-mm as a start. I will choose to start with 80-mm to most closely match the work done in Box 5.1 (note that the authors of Box 5.1 used 90 but only had a 10-mm width on this first interval). The interval for each fish is then found and appended to the dataframe (but renamed) with,
> d.age1 <- lencat(d.age, "tl", startcat = 80, w = 20) > view(d.age1) sex tl age LCat 5 F 104 1 100 28 F 413 4 400 40 F 438 5 420 45 F 450 6 440 51 F 463 7 460 59 F 486 9 480
Once the length category variable has been added to the age sample data frame, table() can be used to construct the summary contingency table of numbers of fish in each combined length and age category. The row variable (length category) is the first and the column variable (age) is the second argument to this function. The results of table() should be assigned to an object and then submitted as the first argument to prop.table() along with margin=1 as a second argument (this is R’s way of saying "row") to construct a row-proportions table. The resulting row-proportions table is the actual age-length key (as proportions and not percentages as shown in Box 5.1) determined from the age sample and is ready to be applied to the length sample. The summary contingency table and the row-proportion table, i.e., the age-length key, are constructed with (note: the results were rounded for display purposes only),
> d.raw <- table(d.age1$LCat, d.age1$age)
> d.key <- prop.table(d.raw, margin = 1)
> round(d.key, 3)
1 2 3 4 5 6 7 8 9 10
80 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
100 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
120 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
140
160
180
200
220
240 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
260
280
300 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
320 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
340 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
360 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
380 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 0.000
400 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000
420 0.000 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000
440 0.000 0.000 0.000 0.000 0.286 0.357 0.000 0.214 0.143 0.000
460 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000
480 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.667
500
Finally, it is important to replace all of the "blank" cells ("NA"s in R parlance) with zeroes. This is most easily accomplished with,
> d.key[which(is.na(d.key))] <- 0
where the code inside of the square brackets basically finds each position in the d.key matrix that has a value of "NA" and the entire code replaces these positions with zeroes. The new age-length key now looks like,
> round(d.key, 3)
1 2 3 4 5 6 7 8 9 10
80 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
100 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
120 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
140 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
160 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
180 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
220 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
240 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
280 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
300 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
320 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
340 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
360 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
380 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 0.000
400 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000
420 0.000 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000
440 0.000 0.000 0.000 0.000 0.286 0.357 0.000 0.214 0.143 0.000
460 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000
480 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.667
500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
The first step in applying the age-length key is to construct the length frequency in the same 20-mm wide length intervals used to construct the age-length key. If the age- and length-samples span the same lengths then one can apply the same lencat() function as before but to the length sample. As noted before, the length-sample for the spotted suckers contains lengths of fish that were not present in the age-sample and, thus, are not present in the age-length key. This can be seen with a quick summary (look at the maximum value) of the total lengths in the length-sample,
> Summarize(d.len$tl, numdigs = 2)
n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max.
416.00 460.99 54.26 318.00 430.50 463.00 497.00 580.00
Because of the different ranges of lengths in the two samples there is no relation in the age-length key that explains what ages fishes of this length should be. Thus, for the purposes of this example, the length sample will be reduced to only those fish with total lengths less than 500 mm to correspond with the lengths in the age-length key. The following use of Subset() creates a new data frame (called d.len1) containing only fish from the old data frame with a total length less than 500 mm,
> d.len1 <- Subset(d.len, tl < 500)
The length intervals variable can then be appended to this new data frame and the frequency of fish in each of these intervals is found with table(),
> d.len2 <- lencat(d.len1, "tl", startcat = 80, w = 20) > len.freq <- table(d.len2$LCat) > len.freq 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 0 0 0 0 0 0 0 0 0 0 0 3 6 12 12 30 28 48 51 61 83 0
As explained in the text, the idea now is to apportion the number of fish in each length interval of the length-sample into age categories based on the relationships shown in the age-length key. So, for example, of the 3 300-mm fish in the length sample 100% should be assigned age-3 and of the 30 380-mm fish in the length sample 33.3% should be assigned age-3 and 66.7% should be assigned age-4.
As Box 5.1 shows, these calculations can become tedious if you have to do all of these calculations individually. Fortunately, if you are very careful with the construction of the length intervals, these calculations can be greatly simplified with matrix multiplication. The multiplication of two matrices requires that the number of columns of the first matrix be the same as the number of rows of the second matrix. The resulting matrix will have as many rows as the first matrix and as many columns as the second matrix. The length frequency "vector" can be thought of as a "matrix" with one row and 22 columns as shown below (and look at the vector above),
> length(len.freq) [1] 22
The age-length key has 22 rows and 10 columns as shown below (and look at the age-length key matrix above),
> dim(d.key) [1] 22 10
The dimensions of these matrices imply that we can appropriately multiply the length-frequency vector by the age-length key matrix. However, what does this accomplish? In matrix multiplication the cell in the resulting matrix is the sum of the product of each element of the corresponding row in the first matrix and each element in the corresponding column of the second matrix. For example, the value in the cell of the first row and third column of the resulting matrix is the sum of the product of the elements in the first row of the first matrix and the third column of the second matrix.
In this situation, the resulting matrix will consist of one row with as many columns as ages in the age-length key where each value in the row is the sum of the length frequency values (i.e., the row of the first matrix) times the corresponding column of the age-length key. Thus, for example, the age-3 column of the resulting matrix would be found with,
0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+3*1+6*1+12*1+12*1+30*0.333+28*0+48*0+51*0+61*0+83*0+0*0 = 43
If you study this closely you will see that it says "all three 300-mm fish are age-3, all six 320-mm fish are age-3, all twelve 340-mm fish are age-3, all twelve 360-mm fish are age-3, 33.3% of the 30 380-mm fish are age-3, none of the 40 400-mm fish are age-3, …" which results in "an estimated 43 age-3 fish." This IS the calculation that we want to produce the final age-frequency for the individuals in the length-sample.
Matrix multiplication is accomplished in R with the matrix multiplication operator shown in the code below. This operator must be preceded by the first matrix and succeeded by the second matrix to be multiplied. Note that you will get an error if the dimensions do not match as discussed above. The age-frequency for the spotted suckers in the length sample that were less than 500-mm can be estimated with,
> age.freq <- len.freq %*% d.key
> age.freq
1 2 3 4 5 6 7 8 9 10
[1,] 0 0 43 64 46.57143 18.21429 30.5 10.92857 65.45238 55.33333
As noted in the beginning, it is my opinion that the Isermann and Knight (2005) method is a better method for handling age-length keys. In this section, I will demonstrate how to apply this method to the spotted sucker data. Note that I describe the Isermann and Knight (2005) method in more detail here. This section assumes that the age-length key has already been constructed (as shown above) and the modified length sample (i.e., only those fish less than 500 mm) is being used.
The Isermann and Knight (2005) method is implemented with age.key() from the FSA package. This function requires the following arguments,
key: A numeric matrix containing the age-length key (as constructed with prop.table() ).
dl: A data frame containing the length-sample of fish.
cl: A number or character string indicating which column of dl contains the length measurements.
ca: A number or character string indicating which column of dl should receive the age assignments. If the column does not exist in the current data frame then one will be appended with the name given in ca.
The age.key() function will determine the length categories to construct based on the age-length key sent in the key= argument. The results of age.key() should be assigned to an object, preferably with a name different from the original length sample. Random ages were assigned to the un-aged fish in the length sample with,
> d.len3 <- age.key(d.key, d.len1, cl = "tl", ca = "age")
> view(d.len3)
tl age
23 429 4
82 499 9
258 494 10
341 462 7
360 474 9
362 474 7
The original (not modified) age-sample data frame and the modified length-sample data frame (i.e., now containing the ages assigned via the age-length key) can then be bound together to construct a data frame that consists of lengths and ages for all fish in the study. These two data frames are combined with,
> d.comb <- rbind(d.len3, d.age[, c("tl", "age")])
> view(d.comb)
tl age
11 386 3
38 470 9
39 470 9
284 467 9
356 474 7
416 405 4
The assigned ages in the rb.comb data frame can then be used to, for example, compute an overall age-frequency,
> table(d.comb$age) 1 2 3 4 5 6 7 8 9 10 6 4 54 77 56 24 32 14 70 58
or calculate summary statistics of size-at age for ALL individuals in the study,
> Summarize(tl ~ age, data = d.comb, numdigs = 2)
n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max.
1 6 108.00 8.37 99 101.0 107.5 113.2 120
2 4 251.75 2.36 250 250.0 251.0 252.8 255
3 54 360.43 22.76 318 344.0 359.5 379.0 399
4 77 411.19 15.92 382 399.0 413.0 418.0 438
5 56 438.02 9.71 420 431.0 437.0 444.0 459
6 24 448.58 4.26 441 445.5 449.5 451.0 459
7 32 466.12 4.57 462 463.0 463.0 470.0 474
8 14 448.36 4.68 441 446.0 448.0 451.0 457
9 70 475.16 15.16 449 467.0 470.0 490.0 499
10 58 492.59 5.98 480 490.0 494.0 497.0 499