Required Packages

> library(NCStats) # Summarize, view, Subset
> library(FSA)     # lencat, age.key

Preparing Data

The working directory is set, the age sample data file (i.e., box5_1_aged.txt) and length sample data file (i.e., box5_1_length.txt) are read, and the structure of each data frame is observed with,

> setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box5_1")
> d.age <- read.table("box5_1_aged.txt", header = TRUE)
> str(d.age)
'data.frame':   61 obs. of  3 variables:
 $ sex: Factor w/ 2 levels "F","M": 2 2 2 1 1 1 2 2 1 1 ...
 $ tl : int  100 111 114 99 104 120 250 255 250 252 ...
 $ age: int  1 1 1 1 1 1 2 2 2 2 ...
> d.len <- read.table("box5_1_length.txt", header = TRUE)
> str(d.len)
'data.frame':   416 obs. of  1 variable:
 $ tl: int  336 336 336 395 395 395 395 386 386 386 ...

Constructing an Age-Length Key

The first step in constructing the age-length key is to create a variable that identifies the length interval category for each fish in the age sample. This variable is constructed, with default name LCat, and appended to the data frame containing the age-sample with lencat() from the FSA package. In this context, lencat() requires four arguments,

The lencat() function returns a data frame that consists of the original data frame plus a variable containing the length interval categories for each fish. The default name of the new variable (LCat) can be changed with the vname= argument. The lencat() function result must be assigned to an object, preferably named differently from the original age sample.

It is important when using an age-length key to make sure that lengths in the age-sample span the same range as the lengths in the length- (i.e., unaged) sample. Unfortunately, this is not the case with the spotted sucker data set provided with Box 5.1. Nevertheless, one should find the minimum length in the age-sample with,

> Summarize(d.age$tl, numdigs = 1)
       n     Mean St. Dev.     Min.  1st Qu.   Median  3rd Qu.     Max.
    61.0    378.1    106.4     99.0    354.0    418.0    444.0    490.0

The length intervals can then start with an even-number 20-mm interval (the authors of Box 5.1 chose to use 20-mm wide intervals) just below the minimum length in the age-sample. In this example, one could start with either 80- or 90-mm as a start. I will choose to start with 80-mm to most closely match the work done in Box 5.1 (note that the authors of Box 5.1 used 90 but only had a 10-mm width on this first interval). The interval for each fish is then found and appended to the dataframe (but renamed) with,

> d.age1 <- lencat(d.age, "tl", startcat = 80, w = 20)
> view(d.age1)
   sex  tl age LCat
1    M 100   1  100
30   F 418   4  400
31   M 418   4  400
32   F 420   4  420
41   M 443   5  440
51   F 463   7  460

Once the length category variable has been added to the age sample data frame, table() can be used to construct the summary contingency table of numbers of fish in each combined length and age category. The row variable (length category) is the first and the column variable (age) is the second argument to this function. The results of table() should be assigned to an object and then submitted as the first argument to prop.table() along with margin=1 as a second argument (this is R’s way of saying "row") to construct a row-proportions table. The resulting row-proportions table is the actual age-length key (as proportions and not percentages as shown in Box 5.1) determined from the age sample and is ready to be applied to the length sample. The summary contingency table and the row-proportion table, i.e., the age-length key, are constructed with (note: the results were rounded for display purposes only),

> d.raw <- table(d.age1$LCat, d.age1$age)
> d.key <- prop.table(d.raw, margin = 1)
> round(d.key, 3)
          1     2     3     4     5     6     7     8     9    10
  80  1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  100 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  120 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  140
  160
  180
  200
  220
  240 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  260
  280
  300 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  320 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  340 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  360 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  380 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 0.000
  400 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000
  420 0.000 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000
  440 0.000 0.000 0.000 0.000 0.286 0.357 0.000 0.214 0.143 0.000
  460 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000
  480 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.667

Finally, it is important to replace all of the "blank" cells ("NA"s in R parlance) with zeroes. This is most easily accomplished with,

> d.key[which(is.na(d.key))] <- 0

where the code inside of the square brackets basically finds each position in the d.key matrix that has a value of "NA" and the entire code replaces these positions with zeroes. The new age-length key now looks like,

> round(d.key, 3)
          1     2     3     4     5     6     7     8     9    10
  80  1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  100 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  120 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  140 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  160 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  180 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  220 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  240 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  280 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  300 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  320 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  340 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  360 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  380 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 0.000
  400 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000
  420 0.000 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000
  440 0.000 0.000 0.000 0.000 0.286 0.357 0.000 0.214 0.143 0.000
  460 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000
  480 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.667

Applying the Age-Length Key I

The first step in applying the age-length key is to construct the length frequency in the same 20-mm wide length intervals used to construct the age-length key. If the age- and length-samples span the same lengths then one can apply the same lencat() function as before but to the length sample. As noted before, the length-sample for the spotted suckers contains lengths of fish that were not present in the age-sample and, thus, are not present in the age-length key. This can be seen with a quick summary (look at the maximum value) of the total lengths in the length-sample,

> Summarize(d.len$tl, numdigs = 2)
       n     Mean St. Dev.     Min.  1st Qu.   Median  3rd Qu.     Max.
  416.00   460.99    54.26   318.00   430.50   463.00   497.00   580.00

Because of the different ranges of lengths in the two samples there is no relation in the age-length key that explains what ages fishes of this length should be. Thus, for the purposes of this example, the length sample will be reduced to only those fish with total lengths less than 500 mm to correspond with the lengths in the age-length key. The following use of Subset() creates a new data frame (called d.len1) containing only fish from the old data frame with a total length less than 500 mm,

> d.len1 <- Subset(d.len, tl < 500)

The length intervals variable can then be appended to this new data frame and the frequency of fish in each of these intervals is found with table(),

> d.len2 <- lencat(d.len1, "tl", startcat = 80, w = 20)
> len.freq <- table(d.len2$LCat)
> len.freq
 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480
  0   0   0   0   0   0   0   0   0   0   0   3   6  12  12  30  28  48  51  61  83

As explained in the text, the idea now is to apportion the number of fish in each length interval of the length-sample into age categories based on the relationships shown in the age-length key. So, for example, of the 3 300-mm fish in the length sample 100% should be assigned age-3 and of the 30 380-mm fish in the length sample 33.3% should be assigned age-3 and 66.7% should be assigned age-4.

As Box 5.1 shows, these calculations can become tedious if you have to do all of these calculations individually. Fortunately, if you are very careful with the construction of the length intervals, these calculations can be greatly simplified with matrix multiplication. The multiplication of two matrices requires that the number of columns of the first matrix be the same as the number of rows of the second matrix. The resulting matrix will have as many rows as the first matrix and as many columns as the second matrix. The length frequency "vector" can be thought of as a "matrix" with one row and 21 columns as shown below (and look at the vector above),

> length(len.freq)
[1] 21

The age-length key has 21 rows and 10 columns as shown below (and look at the age-length key matrix above),

> dim(d.key)
[1] 21 10

The dimensions of these matrices imply that we can appropriately multiply the length-frequency vector by the age-length key matrix. However, what does this accomplish? In matrix multiplication the cell in the resulting matrix is the sum of the product of each element of the corresponding row in the first matrix and each element in the corresponding column of the second matrix. For example, the value in the cell of the first row and third column of the resulting matrix is the sum of the product of the elements in the first row of the first matrix and the third column of the second matrix.

In this situation, the resulting matrix will consist of one row with as many columns as ages in the age-length key where each value in the row is the sum of the length frequency values (i.e., the row of the first matrix) times the corresponding column of the age-length key. Thus, for example, the age-3 column of the resulting matrix would be found with,

0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+3*1+6*1+12*1+12*1+30*0.333+28*0+48*0+51*0+61*0+83*0+0*0 = 43

If you study this closely you will see that it says "all three 300-mm fish are age-3, all six 320-mm fish are age-3, all twelve 340-mm fish are age-3, all twelve 360-mm fish are age-3, 33.3% of the 30 380-mm fish are age-3, none of the 40 400-mm fish are age-3, …" which results in "an estimated 43 age-3 fish." This IS the calculation that we want to produce the final age-frequency for the individuals in the length-sample.

Matrix multiplication is accomplished in R with the matrix multiplication operator shown in the code below. This operator must be preceded by the first matrix and succeeded by the second matrix to be multiplied. Note that you will get an error if the dimensions do not match as discussed above. The age-frequency for the spotted suckers in the length sample that were less than 500-mm can be estimated with,

> age.freq <- len.freq %*% d.key
> age.freq
       1 2  3  4        5        6    7        8        9       10
  [1,] 0 0 43 64 46.57143 18.21429 30.5 10.92857 65.45238 55.33333

Applying the Age-Length Key II

As noted in the beginning, it is my opinion that the Isermann and Knight (2005) method is a better method for handling age-length keys. In this section, I will demonstrate how to apply this method to the spotted sucker data. Note that I describe the Isermann and Knight (2005) method in more detail here. This section assumes that the age-length key has already been constructed (as shown above) and the modified length sample (i.e., only those fish less than 500 mm) is being used.

The Isermann and Knight (2005) method is implemented with age.key() from the FSA package. This function requires the following arguments,

The age.key() function will determine the length categories to construct based on the age-length key sent in the key= argument. The results of age.key() should be assigned to an object, preferably with a name different from the original length sample. Random ages were assigned to the un-aged fish in the length sample with,

> d.len3 <- age.key(d.key, d.len1, cl = "tl", ca = "age")
> view(d.len3)
     tl age
13  416   4
171 395   4
195 496  10
209 450   6
294 413   4
299 418   4

The original (not modified) age-sample data frame and the modified length-sample data frame (i.e., now containing the ages assigned via the age-length key) can then be bound together to construct a data frame that consists of lengths and ages for all fish in the study. These two data frames are combined with,

> d.comb <- rbind(d.len3, d.age[, c("tl", "age")])
> view(d.comb)
     tl age
78  499  10
167 354   3
251 494  10
307 438   5
349 449   6
403 486  10

The assigned ages in the rb.comb data frame can then be used to, for example, compute an overall age-frequency,

> table(d.comb$age)
 1  2  3  4  5  6  7  8  9 10
 6  4 54 77 57 23 33 14 70 57

or calculate summary statistics of size-at age for ALL individuals in the study,

> Summarize(tl ~ age, data = d.comb, numdigs = 2)
    n   Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max.
1   6 108.00     8.37   99   101.0  107.5   113.2  120
2   4 251.75     2.36  250   250.0  251.0   252.8  255
3  54 360.37    22.75  318   344.0  359.5   379.0  399
4  77 411.58    16.23  382   399.0  413.0   418.0  438
5  57 437.14     9.11  420   431.0  437.0   444.0  459
6  23 449.96     5.03  441   449.0  450.0   452.0  459
7  33 466.85     4.18  462   463.0  467.0   470.0  474
8  14 450.14     5.70  441   446.8  449.5   454.0  459
9  70 475.56    16.65  443   463.0  470.0   494.0  499
10 57 491.74     6.39  480   486.0  494.0   497.0  499