// library(ascii) // setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box5_1") // Asciidoc("Box5_1a.Rnw") AIFFD Box 5.1 Vignette ======================= :Author: Derek H. Ogle :Email: dogle@northland.edu :Date: 10-June-2010 :Revision: 3 .Author Comment **** It is my opinion that the age-length key method shown in Box 5.1 of the book is cumbersome because of the tremendous amount of _if..then..else_ statements and the fact that the user must re-enter data (i.e., the percentage at age for a given length interval when expanding the age-length key). In addition, the final result (table on p. 201) produces fractional fish in certain age-length categories. I recommend that the user consider the Isermann and Knight (2005) method which I have described link:../../../gnrlex/AgeLengthKey/AgeLengthKey.pdf[here] and for which Isermann and Knight provided a SAS program. Nevertheless, I will attempt to recreate the process shown in Box 5.1 below. **** == Required Packages ---- > library(NCStats) # Summarize, view, Subset > library(FSA) # lencat, age.key ---- == Preparing Data The working directory is set, the age sample data file (i.e., link:box5_1_aged.txt[]) and length sample data file (i.e., link:box5_1_length.txt[]) are read, and the structure of each data frame is observed with, ---- > setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box5_1") > d.age <- read.table("box5_1_aged.txt", header = TRUE) > str(d.age) 'data.frame': 61 obs. of 3 variables: $ sex: Factor w/ 2 levels "F","M": 2 2 2 1 1 1 2 2 1 1 ... $ tl : int 100 111 114 99 104 120 250 255 250 252 ... $ age: int 1 1 1 1 1 1 2 2 2 2 ... > d.len <- read.table("box5_1_length.txt", header = TRUE) > str(d.len) 'data.frame': 416 obs. of 1 variable: $ tl: int 336 336 336 395 395 395 395 386 386 386 ... ---- == Constructing an Age-Length Key The first step in constructing the age-length key is to create a variable that identifies the length interval category for each fish in the age sample. This variable is constructed, with default name +*LCat*+, and appended to the data frame containing the age-sample with +*[red]#lencat()#*+ from the +*FSA*+ package. In this context, +*[red]#lencat()#*+ requires four arguments, - +*[red]#d#*+: the data frame containing the age-sample, - +*[red]#cl#*+: a number or string indicating which column of the age sample data frame contains the measured length data, - +*[red]#startcat#*+: a value identifying the starting length measurement category, and - +*[red]#w#*+: a value identifying the width of the length measurement categories. The +*[red]#lencat()#*+ function returns a data frame that consists of the original data frame plus a variable containing the length interval categories for each fish. The default name of the new variable (+*LCat*+) can be changed with the +*[red]#vname=#*+ argument. The +*[red]#lencat()#*+ function result must be assigned to an object, preferably named differently from the original age sample. It is important when using an age-length key to make sure that lengths in the age-sample span the same range as the lengths in the length- (i.e., unaged) sample. Unfortunately, this is not the case with the spotted sucker data set provided with Box 5.1. Nevertheless, one should find the minimum length in the age-sample with, ---- > Summarize(d.age$tl, numdigs = 1) n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max. 61.0 378.1 106.4 99.0 354.0 418.0 444.0 490.0 ---- The length intervals can then start with an even-number 20-mm interval (the authors of Box 5.1 chose to use 20-mm wide intervals) just below the minimum length in the age-sample. In this example, one could start with either 80- or 90-mm as a start. I will choose to start with 80-mm to most closely match the work done in Box 5.1 (note that the authors of Box 5.1 used 90 but only had a 10-mm width on this first interval). The interval for each fish is then found and appended to the dataframe (but renamed) with, ---- > d.age1 <- lencat(d.age, "tl", startcat = 80, w = 20) > view(d.age1) sex tl age LCat 1 M 100 1 100 30 F 418 4 400 31 M 418 4 400 32 F 420 4 420 41 M 443 5 440 51 F 463 7 460 ---- Once the length category variable has been added to the age sample data frame, +*[red]#table()#*+ can be used to construct the summary contingency table of numbers of fish in each combined length and age category. The row variable (length category) is the first and the column variable (age) is the second argument to this function. The results of +*[red]#table()#*+ should be assigned to an object and then submitted as the first argument to +*[red]#prop.table()#*+ along with +*[red]#margin=1#*+ as a second argument (this is R's way of saying "row") to construct a row-proportions table. The resulting row-proportions table is the actual age-length key (as proportions and not percentages as shown in Box 5.1) determined from the age sample and is ready to be applied to the length sample. The summary contingency table and the row-proportion table, i.e., the age-length key, are constructed with (note: the results were rounded for display purposes only), ---- > d.raw <- table(d.age1$LCat, d.age1$age) > d.key <- prop.table(d.raw, margin = 1) > round(d.key, 3) 1 2 3 4 5 6 7 8 9 10 80 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 100 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 120 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 140 160 180 200 220 240 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 260 280 300 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 320 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 340 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 360 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 380 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 0.000 400 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 420 0.000 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 440 0.000 0.000 0.000 0.000 0.286 0.357 0.000 0.214 0.143 0.000 460 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000 480 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.667 ---- Finally, it is important to replace all of the "blank" cells ("NA"s in R parlance) with zeroes. This is most easily accomplished with, ---- > d.key[which(is.na(d.key))] <- 0 ---- where the code inside of the square brackets basically finds each position in the +*d.key*+ matrix that has a value of "NA" and the entire code replaces these positions with zeroes. The new age-length key now looks like, ---- > round(d.key, 3) 1 2 3 4 5 6 7 8 9 10 80 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 100 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 120 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 140 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 160 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 180 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 220 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 240 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 280 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 300 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 320 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 340 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 360 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 380 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 0.000 400 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 420 0.000 0.000 0.000 0.333 0.667 0.000 0.000 0.000 0.000 0.000 440 0.000 0.000 0.000 0.000 0.286 0.357 0.000 0.214 0.143 0.000 460 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000 480 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.667 ---- .Author Comment **** The age-length key just computed does not match what is shown in Box 5.1 for a variety of reasons. Most importantly, the age-length key shown in Box 5.1 is a __**column**__- rather than __**row**__-proportions table. The row-proportions table shown here is correct. In addition, as noted above, the authors used percentages rather than proportions (which is inefficient because later, in their program, they divide each percentage by 100 to make it a proportion) and their first interval starts at 90-mm and is only 10-mm wide. In addition, I prefer to show the intermediate length intervals that do not contain any fish in the age-sample. This may be cumbersome with wide range of lengths but it will help troubleshoot problems if the length-sample contains fish of these lengths (i.e., there will be no rule to say what age these fish should be). **** == Applying the Age-Length Key I The first step in applying the age-length key is to construct the length frequency in the same 20-mm wide length intervals used to construct the age-length key. If the age- and length-samples span the same lengths then one can apply the same +*[red]#lencat()#*+ function as before but to the length sample. As noted before, the length-sample for the spotted suckers contains lengths of fish that were not present in the age-sample and, thus, are not present in the age-length key. This can be seen with a quick summary (look at the maximum value) of the total lengths in the length-sample, ---- > Summarize(d.len$tl, numdigs = 2) n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max. 416.00 460.99 54.26 318.00 430.50 463.00 497.00 580.00 ---- Because of the different ranges of lengths in the two samples there is no relation in the age-length key that explains what ages fishes of this length should be. Thus, for the purposes of this example, the length sample will be reduced to only those fish with total lengths less than 500 mm to correspond with the lengths in the age-length key. The following use of +*[red]#Subset()#*+ creates a new data frame (called +*d.len1*+) containing only fish from the old data frame with a total length less than 500 mm, ---- > d.len1 <- Subset(d.len, tl < 500) ---- The length intervals variable can then be appended to this new data frame and the frequency of fish in each of these intervals is found with +*[red]#table()#*+, ---- > d.len2 <- lencat(d.len1, "tl", startcat = 80, w = 20) > len.freq <- table(d.len2$LCat) > len.freq 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 0 0 0 0 0 0 0 0 0 0 0 3 6 12 12 30 28 48 51 61 83 ---- As explained in the text, the idea now is to apportion the number of fish in each length interval of the length-sample into age categories based on the relationships shown in the age-length key. So, for example, of the 3 300-mm fish in the length sample 100% should be assigned age-3 and of the 30 380-mm fish in the length sample 33.3% should be assigned age-3 and 66.7% should be assigned age-4. As Box 5.1 shows, these calculations can become tedious if you have to do all of these calculations individually. Fortunately, if you are very careful with the construction of the length intervals, these calculations can be greatly simplified with matrix multiplication. The multiplication of two matrices requires that the number of columns of the first matrix be the same as the number of rows of the second matrix. The resulting matrix will have as many rows as the first matrix and as many columns as the second matrix. The length frequency "vector" can be thought of as a "matrix" with one row and 21 columns as shown below (and look at the vector above), ---- > length(len.freq) [1] 21 ---- The age-length key has 21 rows and 10 columns as shown below (and look at the age-length key matrix above), ---- > dim(d.key) [1] 21 10 ---- The dimensions of these matrices imply that we can appropriately multiply the length-frequency vector by the age-length key matrix. However, what does this accomplish? In matrix multiplication the cell in the resulting matrix is the sum of the product of each element of the corresponding row in the first matrix and each element in the corresponding column of the second matrix. For example, the value in the cell of the first row and third column of the resulting matrix is the sum of the product of the elements in the first row of the first matrix and the third column of the second matrix. In this situation, the resulting matrix will consist of one row with as many columns as ages in the age-length key where each value in the row is the sum of the length frequency values (i.e., the row of the first matrix) times the corresponding column of the age-length key. Thus, for example, the age-3 column of the resulting matrix would be found with, 0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+3*1+6*1+12*1+12*1+30*0.333+28*0+48*0+51*0+61*0+83*0+0*0 = 43 If you study this closely you will see that it says "all three 300-mm fish are age-3, all six 320-mm fish are age-3, all twelve 340-mm fish are age-3, all twelve 360-mm fish are age-3, 33.3% of the 30 380-mm fish are age-3, none of the 40 400-mm fish are age-3, ..." which results in "an estimated 43 age-3 fish." This IS the calculation that we want to produce the final age-frequency for the individuals in the length-sample. Matrix multiplication is accomplished in R with the matrix multiplication operator shown in the code below. This operator must be preceded by the first matrix and succeeded by the second matrix to be multiplied. Note that you will get an error if the dimensions do not match as discussed above. The age-frequency for the spotted suckers in the length sample that were less than 500-mm can be estimated with, ---- > age.freq <- len.freq %*% d.key > age.freq 1 2 3 4 5 6 7 8 9 10 [1,] 0 0 43 64 46.57143 18.21429 30.5 10.92857 65.45238 55.33333 ---- == Applying the Age-Length Key II As noted in the beginning, it is my opinion that the Isermann and Knight (2005) method is a better method for handling age-length keys. In this section, I will demonstrate how to apply this method to the spotted sucker data. Note that I describe the Isermann and Knight (2005) method in more detail link:../../../gnrlex/AgeLengthKey/AgeLengthKey.pdf[here]. This section assumes that the age-length key has already been constructed (as shown above) and the modified length sample (i.e., only those fish less than 500 mm) is being used. The Isermann and Knight (2005) method is implemented with +*[red]#age.key()#*+ from the +*FSA*+ package. This function requires the following arguments, - +*[red]#key#*+: A numeric matrix containing the age-length key (as constructed with +*[red]#prop.table()#*+ ). - +*[red]#dl#*+: A data frame containing the length-sample of fish. - +*[red]#cl#*+: A number or character string indicating which column of +*[red]#dl#*+ contains the length measurements. - +*[red]#ca#*+: A number or character string indicating which column of +*[red]#dl#*+ should receive the age assignments. If the column does not exist in the current data frame then one will be appended with the name given in +*[red]#ca#*+. The +*[red]#age.key()#*+ function will determine the length categories to construct based on the age-length key sent in the +*[red]#key=#*+ argument. The results of +*[red]#age.key()#*+ should be assigned to an object, preferably with a name different from the original length sample. Random ages were assigned to the un-aged fish in the length sample with, ---- > d.len3 <- age.key(d.key, d.len1, cl = "tl", ca = "age") > view(d.len3) tl age 13 416 4 171 395 4 195 496 10 209 450 6 294 413 4 299 418 4 ---- The original (not modified) age-sample data frame and the modified length-sample data frame (i.e., now containing the ages assigned via the age-length key) can then be bound together to construct a data frame that consists of lengths and ages for all fish in the study. These two data frames are combined with, ---- > d.comb <- rbind(d.len3, d.age[, c("tl", "age")]) > view(d.comb) tl age 78 499 10 167 354 3 251 494 10 307 438 5 349 449 6 403 486 10 ---- The assigned ages in the +*rb.comb*+ data frame can then be used to, for example, compute an overall age-frequency, ---- > table(d.comb$age) 1 2 3 4 5 6 7 8 9 10 6 4 54 77 57 23 33 14 70 57 ---- or calculate summary statistics of size-at age for ALL individuals in the study, ---- > Summarize(tl ~ age, data = d.comb, numdigs = 2) n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max. 1 6 108.00 8.37 99 101.0 107.5 113.2 120 2 4 251.75 2.36 250 250.0 251.0 252.8 255 3 54 360.37 22.75 318 344.0 359.5 379.0 399 4 77 411.58 16.23 382 399.0 413.0 418.0 438 5 57 437.14 9.11 420 431.0 437.0 444.0 459 6 23 449.96 5.03 441 449.0 450.0 452.0 459 7 33 466.85 4.18 462 463.0 467.0 470.0 474 8 14 450.14 5.70 441 446.8 449.5 454.0 459 9 70 475.56 16.65 443 463.0 470.0 494.0 499 10 57 491.74 6.39 480 486.0 494.0 497.0 499 ----