Learning outcomes

  1. Marginality principle
  2. Exhaustive search for linear models
  3. Exhaustive search for generalised linear models

Required R packages:

install.packages(c("mplot","leaps","bestglm","car"))

Crime data

data("UScrime", package = "MASS")
M0 = lm(y ~ 1,data=UScrime) # Null model
M1 = lm(y ~ .,data=UScrime) # Full model

Exercises

  • Use the regsubsets() function from the leaps package to obtain the 50 best models for each model size and plot their RSS against model dimension.

  • Use the method = "forward" and method = "backward" and method = "exhaustive" options to perform forward, backward and exhaustive model selection and compare the results (only find a single best model at each dimension). Hint: The regsubsets function returns several information criteria, choose for example Mallow’s Cp.

  • Use the plot method associated with regsubsets objects to visualise the BIC for the various models identified.

  • The car package offers alternative plot methods to visualise the results from a call to regsubsets(). The code below plots \(C_p\) against dimension for a regsubsets object called exh. Try plotting adjusted \(R^2\) against dimension.

library(car)
par(mar = c(4.5,4.5,1,1),mfrow=c(1,1))
subsets(exh, statistic="cp", legend = FALSE, max.size=8)
abline(a = 0, b = 1, lty = 2)

Birthweight data

The second example is the birthwt dataset from the MASS package which has data on 189 births at the Baystate Medical Centre, Springfield, Massachusetts during 1986 (Venables and Ripley, 2002) The main variable of interest is low birth weight, a binary response variable low (Hosmer and Lemeshow, 1989). Take the same approach to modelling the full model as in Venables and Ripley (2002), where ptl is reduced to a binary indicator of past history and ftv is reduced to a factor with three levels:

library(MASS)
bwt <- with(birthwt, {
  race <- factor(race, labels = c("white", "black", "other"))
  ptd <- factor(ptl > 0)
  ftv <- factor(ftv)
  levels(ftv)[-(1:2)] <- "2+"
  data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0),
             ptd, ht = (ht > 0), ui = (ui > 0), ftv)
})
options(contrasts = c("contr.treatment", "contr.poly"))
bw.glm1 <- glm(low ~ ., family = binomial, data = bwt)
bw.glm0 <- glm(low ~ 1, family = binomial, data = bwt)
summary(bw.glm1)

Exercises

  • Use forward and backward stepwise to identify potential models. Do you get the same model using both approaches?

  • Use the bestglm() function from the bestglm package to identify the top 10 models according to the AIC.

Rock-wallabies: cross validation

Use the cross-validation argument in bestglm to identify the best rock-wallaby model using cross-validation.

data("wallabies", package = "mplot")
names(wallabies)
wdat = data.frame(subset(wallabies, select = -c(lat, long)),
                  EaD = wallabies$edible * wallabies$distance,
                  EaS = wallabies$edible * wallabies$shelter,
                  DaS = wallabies$distance * wallabies$shelter)
X = subset(wdat, select = -rw)
y = wdat$rw
Xy = as.data.frame(cbind(X, y))

Advanced: Stagewise selection

The basic idea behind stagewise modelling is:

  1. Standardise the predictors, \(\tilde{x}_{ij} = (x_{ij} - \bar{x}_j)/\text{sd}(x_j)\), and centre the responses, \(\tilde{y}_i = y_i - \bar{y}\).
  2. Initialise \(\hat{\boldsymbol{\mu}} = (0,\ldots,0)'\), \(\mathbf{r} = \tilde{\mathbf{y}}\), and \(\hat{\beta}_1 = \ldots = \hat{\beta}_p = 0\).
  3. For i in large_number:
    1. Find the predictor \(\tilde{x}_j\) most correlated with \(\mathbf{r}\) (the residual vector). This is also the same as picking the variable that would result in the biggest drop in RSS.
    2. Set \(\delta = \epsilon\times\text{sign}(\text{cor}(\tilde{\mathbf{x}}_j,\mathbf{r}))\) and update the parameters:
      • \(\hat{\beta}_j = \hat{\beta}_j + \delta\)
      • \(\hat{\boldsymbol{\mu}} = \hat{\boldsymbol{\mu}} + \delta\tilde{\mathbf{x}}_j\)
      • \(\mathbf{r} = \mathbf{r} - \delta\tilde{\mathbf{x}}_j\)

Exercise

Implement the stagewise procedure for the artificial example introduced in the lecture. Hint: the core of the function will look like this:

data("artificialeg",package="mplot")
y = artificialeg$y-mean(artificialeg$y)
X = artificialeg[,-10]
y = # mean zero predictors
M = # standardised predictor matrix
lots = 20000 
beta = matrix(0, ncol=ncol(M), nrow=lots)
r = y
eps = 0.001
for (i in 2:lots){
  co = cor(M,r) 
  j = 
  delta = 
  b = beta[i-1,]
  b[j] =
  beta[i,] = b
  r = 
}
  • Visualise the path of the estimated coefficients against step number. Hint: use matplot().
  • Visualise the path of the estimated coefficients against the sum of the absolute values of the coefficients.