AIFFD Linear Models =================== Derek H. Ogle Linear models -- e.g., analysis of variance, simple linear regression, analysis of covariance, multiple linear regression -- are used throughout the boxed examples in the AIFFD book. In R, all of these models are implemented with one constructor function -- +*[red]#lm()#*+ -- that can receive a variety of formula types. The +*[red]#lm()#*+ function will then fit one of the linear models depending on the types of variables present in the formula. This page briefly describes the use of +*[red]#lm()#*+ for these variety of models. == R Formulae An R formula consists of a left-hand-side (the response or dependent; LHS) and a right-hand-side (the explanatory, preditor, or independent; RHS) separated by a tilde. For the purposes of the boxed examples in AIFFD, the LHS will (nearly always) consist of a continuous response variable. The RHS, on the other hand, will consist of a single explanatory variable or some function of several explanatory variables. For our purposes, we need to note that explanatory variables can be "added" to the RHS by including a "plus sign" followed by the variable name and interaction terms are symbolized by the two variables forming the interaction separated by a colon (e.g., +*A:B*+ represents the interaction between +*A*+ and +*B*+). Finally, note that R uses a short-hand notation of +*A*B*+ to note that the RHS should include the two main effect terms and an interaction term (i.e., +*A + B + A:B*+. TIP: The apparent multiplication of two variables in the RHS of a model formula is short-hand notation for including both variables as main effects and the interaction between the two variables -- i.e., +*A*B*+ is the equavilent to saying +*A + B + A:B*+. == +*[red]#lm()#*+ Constructor Function The +*[red]#lm()#*+ function requires two arguments. The first argument is a model formula as described in the previous section. Different model formulae provide different analyses depending on the variables in the formula. The second argument, the +*[red]#data=#*+ argument, tells R which data frame the variables in the formula can be found. The results of +*[red]#lm()#*+ should be saved to an object so that that object can be submitted to a variety of extractor functions to return specific results. .The variety of linear models produced by different formulae supplied to +*[red]#lm()#*+. Note that the generic variables in these formula are defined as follows: +*Y*+ is continuous response variable, +*X1*+ and +*X2*+ are continuous explanatory variables, and +*G1*+ and +*G2*+ are categorical group factor explanatory variables. [grid="rows",width="60%"] [options="header",cols="<20%s,<60%,<20%"] |================================================================================== |R Formula | Linear Model | Example |Y~X1 | Simple Linear Regression | link:../Box6_4/Box6_4a.html[Box 6.4] |Y~G1 | One-way ANOVA | link:../Box3_8/Box3_8a.html[Box 3.8] |Y~G1*G2 | Two-way ANOVA (with interaction) | link:../Box3_13/Box3_13a.html[Box 3.13] |Y~G1+G2 | Two-way ANOVA (withOUT interaction) | link:../Box5_5/Box5_5a.html[Box 5.5] (last section) |Y~X1*G1 | One-way Indicator Variable Regression (ANCOVA-like model) | link:../Box3_11/Box3_11a.html[Box 3.11] |Y~X1+G1 | One-way ANCOVA | |Y~X1*G1*G2 | Two-way Indicator Variable Regression | |Y~X1*X2 | Multiple Linear Regression (with interaction) | link:../Box7_6/Box7_6a.html[Box 7.6] |================================================================================== == +*[red]#lm()#*+ Extractor Functions A number of functions can be used to extract specific information from an object saved from a +*[red]#lm()#*+ call. .Extractor functions, and corresponding packages, for a linear model object (note that +*lm1*+ represents a saved linear model object). Note that functions in the +*base*+ package do not require any extra packages to be loaded. [grid="rows",width="60%"] [options="header",cols="<30%,<20%s,<50%"] |=================================================================================================== |Function Call | Package | Description |anova(lm1) | base | Extracts the ANOVA table using type-I SS. |coef(lm1) | base | Extracts the values of the parameter coefficients. |confint(lm1) | base | Extracts confidence intervals for the parameter coefficients. |summary(lm1) | base | Extracts the parameter coefficient values, SEs, and default t-test and p-values. Also, extracts coefficient of determination (unadjusted and adjusted), overall F-test and p-value, and rMSE. |predict(lm1) | base | Extracts predictions using the linear model for each individual in the data frame. Modifications (i.e., other arguments) allow predicting other values. |Anova(lm1,type="III") | car | Extracts the ANOVA table using type-III SS. See discussion about different SS calculations in the link:../preliminaries/preliminaries.html[preliminaries vignette]. |Anova(lm1,type="II") | car | Extracts the ANOVA table using type-II SS. See discussion about different SS calculations in the link:../preliminaries/preliminaries.html[preliminaries vignette]. |lsmean(lm1) | pda | Extracts least-squares means. See discussion about least-squares means in the link:../preliminaries/preliminaries.html[preliminaries vignette] |fitPlot(lm1) | NCStats | Constructs a "fitted-line plot" (specifics depends on model; does not work for all model types) |residPlot(lm1) | NCStats | Constructs a residual plot. |hist(lm1$resituals) | base | Constructions histogram of model residuals. |ad.test(lm1$residuals) | nortest | Performs Anderson-Darling test of normality on model residuals. |leveneTest(lm1) | car | Performs Levene's Homogeneity of Variance test on model groups. |outlierTest(lm1) | car | Performs a test for outliers on model residuals. |=================================================================================================== == Reproducibility Information === Version Information * *Compiled Date:* Thu Sep 29 2011 * *Compiled Time:* 7:31:20 PM === Files * link:LinearModels.r[R Script (.R) file] * link:LinearModels.rnw[NoWeb Source (.Rnw) file] * link:LinearModels.txt[Asciidoc (.txt) file] === R Information * *Version:* R version 2.13.1 (2011-07-08) * *System:* Windows, i386-pc-mingw32/i386 (32-bit) * *Base Packages:* base, datasets, graphics, grDevices, methods, splines, stats, tcltk, utils * *Other Packages:* ascii_2.0, Hmisc_3.8-3, miscOgle_0.1-0, R2HTML_2.2, survival_2.36-9, svSocket_0.9-51, TinnR_1.0.3 * *Loaded-Only Packages:* cluster_1.14.0, grid_2.13.1, lattice_0.19-33, svMisc_0.9-63, tools_2.13.1 * *Required Packages:* None.