// library(ascii) // setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_2") // Sweave("Box3_2a.Rnw",driver=RweaveAsciidoc) AIFFD Box 3-2 Vignette ======================= :Author: Derek H. Ogle :Email: dogle@northland.edu :Date: 17-June-2009 :Revision: 2 == Required Packages ---- > library(NCStats) # bin.ci ---- == Preparing Data The working directory is set, the link:box3_2.txt[] data file is read, and the structure of the data frame is observed with, ---- > setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_2") > d <- read.table("box3_2.txt", header = TRUE) > d age n 1 0 55 2 1 22 3 2 10 4 3 18 5 4 6 6 5 3 7 6 1 8 7 1 ---- == Compute Proportions in Each Age Group The proportion of fish in each age group can be computed by dividing the catch in each age group by the total catch; i.e., ---- > ttl <- sum(d$n) # compute (and save) total catch > p <- d$n/ttl # find proportion of "successes" > p [1] 0.47413793 0.18965517 0.08620690 0.15517241 0.05172414 0.02586207 0.00862069 0.00862069 ---- The standard errors are then computed with, ---- > q <- 1-p # find proportion of "failures" > seps <- sqrt(p*q/(ttl-1)) ---- == Confidence Intervals for Population Proportions I The critical value from the F distribution can be found with +*[red]#qf()#*+. This function requires an upper-tail probability as the first argument, numerator df in the second argument, denominator df in the third argument, and +*[red]#lower.tail=FALSE#*+ argument. For example, the two critical F values used for the first age-group in Box 3.2 can be found with, ---- > alpha <- 0.05 > n <- ttl > a <- 55 > f.lwr <- qf(alpha, 2 * (n - a + 1), 2 * a, lower.tail = FALSE) > f.lwr [1] 1.360090 > f.upr <- qf(alpha, 2 * a + 2, 2 * (n - a + 1) - 2, lower.tail = FALSE) > f.upr [1] 1.355947 ---- The CIs for the first age-group can then be computed with, ---- > LCI <- a/(a + (n - a + 1) * f.lwr) > UCI <- ((a + 1) * f.upr)/(n - a + (a + 1) * f.upr) > c(LCI, UCI) [1] 0.3947589 0.5545268 ---- The CIs for all of the age-groups ---- > alpha <- 0.05 > LCIs <- d$n/(d$n + (ttl - d$n + 1) * qf(alpha, 2 * (ttl - d$n + 1), 2 * d$n, lower.tail = FALSE)) > UCIs <- ((d$n + 1) * qf(alpha, 2 * d$n + 2, 2 * (ttl - d$n + 1) - 2, lower.tail = FALSE))/(ttl - d$n + (d$n + 1) * + qf(alpha, 2 * d$n + 2, 2 * (ttl - d$n + 1) - 2, lower.tail = FALSE)) > data.frame(age = d$age, p, seps, LCIs, UCIs) age p seps LCIs UCIs 1 0 0.47413793 0.04656283 0.3947588631 0.55452678 2 1 0.18965517 0.03655682 0.1320279845 0.25960996 3 2 0.08620690 0.02617255 0.0475164835 0.14183827 4 3 0.15517241 0.03376311 0.1027583277 0.22137272 5 4 0.05172414 0.02065214 0.0227626672 0.09953204 6 5 0.02586207 0.01480106 0.0070853559 0.06548328 7 6 0.00862069 0.00862069 0.0004420858 0.04024133 8 7 0.00862069 0.00862069 0.0004420858 0.04024133 ---- These results match what is shown in Box 3.2 == Confidence Intervals for Population Proportions II Many authors suggest methods for computing confidence intervals for proportions other than using the F distribution as shown in the text. One such method is the so-called "Wilson" method which is the default method in +*[red]#bin.ci#*+ of the +*NCStats*+ package. This function requires the "number of success" as the first argument, the "number of trials" as the second argument, and optional level of confidence in +*[red]#conf.level=#*+. The default 95% confidence intervals using the Wilson method can be computed with, ---- > bin.ci(d$n, ttl) 95% LCI 95% UCI 0.3855640505 0.56436980 0.1287138187 0.27049243 0.0474993548 0.15144231 0.1004660814 0.23198530 0.0239186535 0.10826815 0.0088338879 0.07328676 0.0004421836 0.04721983 0.0004421836 0.04721983 ----