// library(ascii) // setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_1") // Sweave("Box3_1a.Rnw",driver=RweaveAsciidoc) AIFFD Box 3-1 Vignette ======================= :Author: Derek H. Ogle :Email: dogle@northland.edu :Date: 17-June-2009 :Revision: 2 == Required Packages ---- > library(NCStats) # Summarize, sd.calc > library(sciplot) # se ---- == Preparing Data The working directory is set, the link:box3_1.txt[] data file is read, and the structure of the data frame is observed with, ---- > setwd("c://aaaWork//web//fishR//bookex//AIFFD//Box3_1") > d <- read.table("box3_1.txt", header = TRUE) > str(d) 'data.frame': 15 obs. of 1 variable: $ catch: int 3 0 4 1 4 1 1 0 1 0 ... ---- == Compute Summary Statistics The mean and standard deviation for the quantitative variable can be computed with +*[red]#Summarize()#*+ from the +*NCStats*+ package. The single required argument is the vector containing the quantitative data. For example, ---- > Summarize(d$catch) n Mean St. Dev. Min. 1st Qu. Median 3rd Qu. Max. 15.000000 1.600000 1.404076 0.000000 0.500000 1.000000 2.500000 4.000000 ---- If you do not want to load the +*NCStats*+ package then the same calculations can be made with +*[red]#summary()#*+ and +*[red]#sd()#*+ of the base packages contained in R, ---- > summary(d$catch) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0 0.5 1.0 1.6 2.5 4.0 > sd(d$catch) [1] 1.404076 ---- Regardless of which way you summarize the results, you can compute the standard error with +*[red]#se()#*+ from the +*sciplot*+ package, ---- > se(d$catch) [1] 0.3625308 ---- == Constructing Confidence Intervals for the Population Mean The critical t value for computing a confidence interval can be found with +*[red]#qt()#*+. This function requires an upper-tail probability (i.e., (1-C)/2 where C is the level of confidence), the degrees-of-freedom, and the +*[red]#lower.tail=FALSE#*+ argument. For example, the t critical value in this situation is, ---- > df <- length(d$catch)-1 # compute df as n-1 > C <- 0.95 # set confidence level > a2 <- (1-C)/2 # find upper tail probability > ct <- qt(a2,df,lower.tail=FALSE) # find critical t-value > ct [1] 2.144787 ---- The two endpoints of the confidence interval can then be computed with, ---- > mn <- mean(d$catch) # find (and save) the mean > semn <- se(d$catch) # find (and save) the SE of the mean > LCI <- mn-ct*semn > UCI <- mn+ct*semn > c(LCI,UCI) # combine together for showing [1] 0.8224488 2.3775512 ---- The confidence interval is computed from the raw data, along with a great deal of other information, with +*[red]#t.test()#*+. In this simplest form this function only requires the vector of quantitative data as the first argument and an optional level of confidence in the +*[red]#conf.level=#*+ argument (it defaults to 95%). For example, ---- > t.test(d$catch) One Sample t-test data: d$catch t = 4.4134, df = 14, p-value = 0.0005894 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.8224488 2.3775512 sample estimates: mean of x 1.6 ---- == Demonstration of Standard Deviation Calculation The intermediate steps in computing a standard deviation, as shown at the beginning of Box 3.1, can be demonstrated with +*[red]#sd.calc()#*+ ---- > sd.calc(d$catch) Demonstration of parts of a std. dev. calculation. x diffs diffs.sq 1 3 1.4 1.96 2 0 -1.6 2.56 3 4 2.4 5.76 4 1 -0.6 0.36 5 4 2.4 5.76 6 1 -0.6 0.36 7 1 -0.6 0.36 8 0 -1.6 2.56 9 1 -0.6 0.36 10 0 -1.6 2.56 11 2 0.4 0.16 12 2 0.4 0.16 13 3 1.4 1.96 14 2 0.4 0.16 15 0 -1.6 2.56 sum 24 0.0 27.60 Mean = x-bar = 24 / 15 = 1.6 Variance = s^2 = 27.6 / 14 = 1.9714286 Std. Dev = s = sqrt(1.9714286) = 1.4040757 ----