Linear models — e.g., analysis of variance, simple linear regression, analysis of covariance, multiple linear regression — are used throughout the boxed examples in the AIFFD book. In R, all of these models are implemented with one constructor function — lm() — that can receive a variety of formula types. The lm() function will then fit one of the linear models depending on the types of variables present in the formula. This page briefly describes the use of lm() for these variety of models.

R Formulae

An R formula consists of a left-hand-side (the response or dependent; LHS) and a right-hand-side (the explanatory, preditor, or independent; RHS) separated by a tilde. For the purposes of the boxed examples in AIFFD, the LHS will (nearly always) consist of a continuous response variable. The RHS, on the other hand, will consist of a single explanatory variable or some function of several explanatory variables. For our purposes, we need to note that explanatory variables can be "added" to the RHS by including a "plus sign" followed by the variable name and interaction terms are symbolized by the two variables forming the interaction separated by a colon (e.g., A:B represents the interaction between A and B). Finally, note that R uses a short-hand notation of A*B to note that the RHS should include the two main effect terms and an interaction term (i.e., A + B + A:B.

Tip The apparent multiplication of two variables in the RHS of a model formula is short-hand notation for including both variables as main effects and the interaction between the two variables — i.e., A*B is the equavilent to saying A + B + A:B.

lm() Constructor Function

The lm() function requires two arguments. The first argument is a model formula as described in the previous section. Different model formulae provide different analyses depending on the variables in the formula. The second argument, the data= argument, tells R which data frame the variables in the formula can be found. The results of lm() should be saved to an object so that that object can be submitted to a variety of extractor functions to return specific results.

Table 1. The variety of linear models produced by different formulae supplied to lm(). Note that the generic variables in these formula are defined as follows: Y is continuous response variable, X1 and X2 are continuous explanatory variables, and G1 and G2 are categorical group factor explanatory variables.
R Formula Linear Model Example

Y~X1

Simple Linear Regression

Box 6.4

Y~G1

One-way ANOVA

Box 3.8

Y~G1*G2

Two-way ANOVA (with interaction)

Box 3.13

Y~G1+G2

Two-way ANOVA (withOUT interaction)

Box 5.5 (last section)

Y~X1*G1

One-way Indicator Variable Regression (ANCOVA-like model)

Box 3.11

Y~X1+G1

One-way ANCOVA

Y~X1*G1*G2

Two-way Indicator Variable Regression

Y~X1*X2

Multiple Linear Regression (with interaction)

Box 7.6

lm() Extractor Functions

A number of functions can be used to extract specific information from an object saved from a lm() call.

Table 2. Extractor functions, and corresponding packages, for a linear model object (note that lm1 represents a saved linear model object). Note that functions in the base package do not require any extra packages to be loaded.
Function Call Package Description

anova(lm1)

base

Extracts the ANOVA table using type-I SS.

coef(lm1)

base

Extracts the values of the parameter coefficients.

confint(lm1)

base

Extracts confidence intervals for the parameter coefficients.

summary(lm1)

base

Extracts the parameter coefficient values, SEs, and default t-test and p-values. Also, extracts coefficient of determination (unadjusted and adjusted), overall F-test and p-value, and rMSE.

predict(lm1)

base

Extracts predictions using the linear model for each individual in the data frame. Modifications (i.e., other arguments) allow predicting other values.

Anova(lm1,type="III")

car

Extracts the ANOVA table using type-III SS. See discussion about different SS calculations in the preliminaries vignette.

Anova(lm1,type="II")

car

Extracts the ANOVA table using type-II SS. See discussion about different SS calculations in the preliminaries vignette.

lsmean(lm1)

pda

Extracts least-squares means. See discussion about least-squares means in the preliminaries vignette

fitPlot(lm1)

NCStats

Constructs a "fitted-line plot" (specifics depends on model; does not work for all model types)

residPlot(lm1)

NCStats

Constructs a residual plot.

hist(lm1$resituals)

base

Constructions histogram of model residuals.

ad.test(lm1$residuals)

nortest

Performs Anderson-Darling test of normality on model residuals.

leveneTest(lm1)

car

Performs Levene’s Homogeneity of Variance test on model groups.

outlierTest(lm1)

car

Performs a test for outliers on model residuals.

Reproducibility Information

Version Information

  • Compiled Date: Thu Sep 29 2011

  • Compiled Time: 7:31:20 PM

R Information

  • Version: R version 2.13.1 (2011-07-08)

  • System: Windows, i386-pc-mingw32/i386 (32-bit)

  • Base Packages: base, datasets, graphics, grDevices, methods, splines, stats, tcltk, utils

  • Other Packages: ascii_2.0, Hmisc_3.8-3, miscOgle_0.1-0, R2HTML_2.2, survival_2.36-9, svSocket_0.9-51, TinnR_1.0.3

  • Loaded-Only Packages: cluster_1.14.0, grid_2.13.1, lattice_0.19-33, svMisc_0.9-63, tools_2.13.1

  • Required Packages: None.