glmnet
install.packages(c("mplot", "ISLR", "glmnet", "MASS", "readr",
"dplyr"))
This is a data set from the UCI Machine Learning Repository.
To make yourself familiar with the data, read an explanation.
We start by reading the data directly from the URL. Because the entries are separated by a comma, we specify sep=","
in the function read.table
or use the read_csv
function from the readr
package:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data"
dat = readr::read_csv(url, col_names = FALSE)
## Parsed with column specification:
## cols(
## .default = col_double(),
## X1 = col_integer(),
## X2 = col_integer(),
## X35 = col_character()
## )
## See spec(...) for full column specifications.
# View(dat)
dim(dat)
## [1] 351 35
dat = dplyr::rename(dat, y = X35)
# names(dat)[35] = 'y' # an alternative
dat$y = as.factor(dat$y)
Look at the summary statistics for each variable and their scatterplot matrix. Show that column V2
is a vector of zeros and should therefore be omitted from the analysis.
summary(dat) # output omitted
boxplot(dat)
pairs(dat[, c(1:5, 35)])
table(dat$X2)
# Alternative methods:
# install.packages(Hmisc)
# Hmisc::describe(dat)
# install.packages("pairsD3")
# pairsD3::shinypairs(dat)
Delete the second column and make available a cleaned design matrix x
with \(p=33\) columns and a response vector y
of length \(n=351\), which is a factor with levels b
and g
.
x = as.matrix(dat[, -c(2, 35)])
dim(x)
## [1] 351 33
y = dat$y
is.factor(y)
## [1] TRUE
table(y)
## y
## b g
## 126 225
Check the pairwise correlations in the design matrix and show that only one pair (the 12th and 14th column of x
) of variables has an absolute Pearson correlation of more than 0.8.
round(cor(x), 1)
## X1 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
## X1 1.0 0.3 0.0 0.2 0.1 0.2 0.0 0.2 -0.1 0.0 0.1
## X3 0.3 1.0 0.1 0.5 0.0 0.4 0.0 0.5 0.0 0.3 0.2
## X4 0.0 0.1 1.0 0.0 -0.2 -0.1 0.3 -0.3 0.2 -0.2 0.3
## X5 0.2 0.5 0.0 1.0 0.0 0.6 0.0 0.5 0.0 0.4 0.0
## X6 0.1 0.0 -0.2 0.0 1.0 0.0 0.3 -0.1 0.2 -0.3 0.2
## X7 0.2 0.4 -0.1 0.6 0.0 1.0 -0.2 0.5 -0.1 0.4 0.0
## X8 0.0 0.0 0.3 0.0 0.3 -0.2 1.0 -0.3 0.4 -0.4 0.4
## X9 0.2 0.5 -0.3 0.5 -0.1 0.5 -0.3 1.0 -0.3 0.7 -0.2
## X10 -0.1 0.0 0.2 0.0 0.2 -0.1 0.4 -0.3 1.0 -0.3 0.4
## X11 0.0 0.3 -0.2 0.4 -0.3 0.4 -0.4 0.7 -0.3 1.0 -0.2
## X12 0.1 0.2 0.3 0.0 0.2 0.0 0.4 -0.2 0.4 -0.2 1.0
## X13 0.1 0.2 -0.1 0.5 -0.3 0.6 -0.4 0.6 -0.4 0.6 -0.2
## X14 0.2 0.2 0.2 0.1 0.1 0.1 0.3 -0.1 0.3 -0.2 0.4
## X15 0.1 0.2 -0.3 0.4 -0.4 0.6 -0.4 0.6 -0.4 0.7 -0.2
## X16 0.1 0.1 0.2 0.1 0.2 0.0 0.4 0.0 0.3 0.0 0.4
## X17 0.1 0.2 -0.3 0.3 -0.3 0.4 -0.5 0.6 -0.4 0.7 -0.3
## X18 0.1 0.2 -0.1 0.0 0.2 0.1 0.1 0.2 0.1 0.1 0.1
## X19 0.2 0.3 -0.3 0.2 -0.2 0.4 -0.4 0.7 -0.5 0.6 -0.4
## X20 0.0 0.2 0.2 0.0 -0.1 0.2 0.1 0.1 0.0 0.1 0.1
## X21 0.2 0.1 -0.3 0.3 -0.1 0.6 -0.4 0.5 -0.4 0.5 -0.3
## X22 -0.2 0.1 0.0 0.2 -0.1 0.2 -0.2 0.2 0.0 0.3 0.0
## X23 0.0 0.3 -0.1 0.5 -0.2 0.4 -0.3 0.4 -0.3 0.6 -0.4
## X24 -0.1 0.0 0.2 0.1 -0.3 0.1 0.0 0.2 0.1 0.2 0.2
## X25 0.0 0.3 -0.1 0.2 -0.2 0.3 -0.2 0.4 -0.3 0.4 -0.2
## X26 0.1 -0.1 -0.2 0.0 0.0 0.1 -0.1 0.1 0.0 0.1 -0.1
## X27 -0.2 0.1 0.0 0.1 -0.2 0.1 -0.3 0.2 -0.3 0.3 -0.2
## X28 0.0 0.1 0.0 0.2 -0.1 0.1 0.1 0.1 0.1 0.2 0.1
## X29 0.1 0.3 0.0 0.3 0.0 0.3 -0.1 0.3 -0.1 0.4 -0.2
## X30 -0.1 0.1 0.3 0.1 -0.2 0.0 0.1 0.0 0.0 0.1 0.1
## X31 0.2 0.2 -0.2 0.4 -0.1 0.4 -0.2 0.3 -0.2 0.3 -0.2
## X32 -0.1 0.0 -0.1 0.0 0.3 0.0 0.2 -0.1 0.0 0.0 0.0
## X33 0.2 0.3 -0.2 0.4 0.0 0.5 -0.2 0.3 -0.2 0.3 -0.2
## X34 0.0 0.0 0.0 -0.1 0.2 -0.1 0.4 -0.1 0.1 -0.2 0.1
## X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
## X1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.0 0.2 -0.2 0.0
## X3 0.2 0.2 0.2 0.1 0.2 0.2 0.3 0.2 0.1 0.1 0.3
## X4 -0.1 0.2 -0.3 0.2 -0.3 -0.1 -0.3 0.2 -0.3 0.0 -0.1
## X5 0.5 0.1 0.4 0.1 0.3 0.0 0.2 0.0 0.3 0.2 0.5
## X6 -0.3 0.1 -0.4 0.2 -0.3 0.2 -0.2 -0.1 -0.1 -0.1 -0.2
## X7 0.6 0.1 0.6 0.0 0.4 0.1 0.4 0.2 0.6 0.2 0.4
## X8 -0.4 0.3 -0.4 0.4 -0.5 0.1 -0.4 0.1 -0.4 -0.2 -0.3
## X9 0.6 -0.1 0.6 0.0 0.6 0.2 0.7 0.1 0.5 0.2 0.4
## X10 -0.4 0.3 -0.4 0.3 -0.4 0.1 -0.5 0.0 -0.4 0.0 -0.3
## X11 0.6 -0.2 0.7 0.0 0.7 0.1 0.6 0.1 0.5 0.3 0.6
## X12 -0.2 0.4 -0.2 0.4 -0.3 0.1 -0.4 0.1 -0.3 0.0 -0.4
## X13 1.0 -0.1 0.8 -0.2 0.7 0.0 0.6 0.2 0.7 0.3 0.5
## X14 -0.1 1.0 -0.2 0.2 -0.3 0.2 -0.3 0.4 -0.3 0.2 -0.3
## X15 0.8 -0.2 1.0 -0.1 0.7 0.1 0.7 0.2 0.7 0.3 0.6
## X16 -0.2 0.2 -0.1 1.0 -0.2 0.3 -0.3 0.3 -0.3 0.2 -0.2
## X17 0.7 -0.3 0.7 -0.2 1.0 0.0 0.7 0.0 0.7 0.2 0.6
## X18 0.0 0.2 0.1 0.3 0.0 1.0 0.0 0.5 -0.1 0.4 -0.2
## X19 0.6 -0.3 0.7 -0.3 0.7 0.0 1.0 0.0 0.7 0.1 0.6
## X20 0.2 0.4 0.2 0.3 0.0 0.5 0.0 1.0 -0.1 0.5 -0.1
## X21 0.7 -0.3 0.7 -0.3 0.7 -0.1 0.7 -0.1 1.0 0.0 0.6
## X22 0.3 0.2 0.3 0.2 0.2 0.4 0.1 0.5 0.0 1.0 0.2
## X23 0.5 -0.3 0.6 -0.2 0.6 -0.2 0.6 -0.1 0.6 0.2 1.0
## X24 0.2 0.2 0.2 0.2 0.1 0.2 0.1 0.3 0.0 0.4 0.0
## X25 0.4 -0.2 0.5 -0.3 0.6 -0.2 0.5 -0.2 0.6 -0.1 0.5
## X26 0.2 0.1 0.2 0.2 0.1 0.4 0.2 0.4 0.2 0.4 0.1
## X27 0.3 -0.3 0.3 -0.3 0.4 -0.2 0.4 -0.2 0.4 0.0 0.6
## X28 0.1 0.1 0.2 0.2 0.2 0.3 0.2 0.3 0.1 0.4 0.2
## X29 0.3 -0.2 0.4 -0.2 0.4 -0.1 0.5 -0.2 0.5 -0.1 0.5
## X30 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.4 0.1 0.3 0.2
## X31 0.4 -0.1 0.4 -0.2 0.4 -0.2 0.4 -0.3 0.5 -0.1 0.5
## X32 0.0 -0.1 0.0 0.1 0.0 0.3 0.1 0.1 0.1 0.2 0.2
## X33 0.5 -0.1 0.5 -0.2 0.4 -0.1 0.4 -0.1 0.6 -0.1 0.5
## X34 -0.1 0.1 -0.1 0.2 -0.1 0.1 0.0 0.2 0.0 0.1 0.0
## X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34
## X1 -0.1 0.0 0.1 -0.2 0.0 0.1 -0.1 0.2 -0.1 0.2 0.0
## X3 0.0 0.3 -0.1 0.1 0.1 0.3 0.1 0.2 0.0 0.3 0.0
## X4 0.2 -0.1 -0.2 0.0 0.0 0.0 0.3 -0.2 -0.1 -0.2 0.0
## X5 0.1 0.2 0.0 0.1 0.2 0.3 0.1 0.4 0.0 0.4 -0.1
## X6 -0.3 -0.2 0.0 -0.2 -0.1 0.0 -0.2 -0.1 0.3 0.0 0.2
## X7 0.1 0.3 0.1 0.1 0.1 0.3 0.0 0.4 0.0 0.5 -0.1
## X8 0.0 -0.2 -0.1 -0.3 0.1 -0.1 0.1 -0.2 0.2 -0.2 0.4
## X9 0.2 0.4 0.1 0.2 0.1 0.3 0.0 0.3 -0.1 0.3 -0.1
## X10 0.1 -0.3 0.0 -0.3 0.1 -0.1 0.0 -0.2 0.0 -0.2 0.1
## X11 0.2 0.4 0.1 0.3 0.2 0.4 0.1 0.3 0.0 0.3 -0.2
## X12 0.2 -0.2 -0.1 -0.2 0.1 -0.2 0.1 -0.2 0.0 -0.2 0.1
## X13 0.2 0.4 0.2 0.3 0.1 0.3 0.1 0.4 0.0 0.5 -0.1
## X14 0.2 -0.2 0.1 -0.3 0.1 -0.2 0.1 -0.1 -0.1 -0.1 0.1
## X15 0.2 0.5 0.2 0.3 0.2 0.4 0.1 0.4 0.0 0.5 -0.1
## X16 0.2 -0.3 0.2 -0.3 0.2 -0.2 0.1 -0.2 0.1 -0.2 0.2
## X17 0.1 0.6 0.1 0.4 0.2 0.4 0.1 0.4 0.0 0.4 -0.1
## X18 0.2 -0.2 0.4 -0.2 0.3 -0.1 0.1 -0.2 0.3 -0.1 0.1
## X19 0.1 0.5 0.2 0.4 0.2 0.5 0.1 0.4 0.1 0.4 0.0
## X20 0.3 -0.2 0.4 -0.2 0.3 -0.2 0.4 -0.3 0.1 -0.1 0.2
## X21 0.0 0.6 0.2 0.4 0.1 0.5 0.1 0.5 0.1 0.6 0.0
## X22 0.4 -0.1 0.4 0.0 0.4 -0.1 0.3 -0.1 0.2 -0.1 0.1
## X23 0.0 0.5 0.1 0.6 0.2 0.5 0.2 0.5 0.2 0.5 0.0
## X24 1.0 -0.1 0.4 0.0 0.4 -0.2 0.4 -0.1 0.1 -0.2 0.1
## X25 -0.1 1.0 -0.1 0.5 0.2 0.7 0.0 0.5 0.1 0.5 0.1
## X26 0.4 -0.1 1.0 0.0 0.4 -0.1 0.3 -0.2 0.3 -0.1 0.2
## X27 0.0 0.5 0.0 1.0 0.1 0.5 0.2 0.4 0.2 0.5 0.1
## X28 0.4 0.2 0.4 0.1 1.0 0.0 0.4 0.0 0.5 -0.1 0.4
## X29 -0.2 0.7 -0.1 0.5 0.0 1.0 0.0 0.6 0.1 0.6 0.1
## X30 0.4 0.0 0.3 0.2 0.4 0.0 1.0 -0.2 0.3 -0.2 0.4
## X31 -0.1 0.5 -0.2 0.4 0.0 0.6 -0.2 1.0 0.0 0.7 0.0
## X32 0.1 0.1 0.3 0.2 0.5 0.1 0.3 0.0 1.0 0.0 0.5
## X33 -0.2 0.5 -0.1 0.5 -0.1 0.6 -0.2 0.7 0.0 1.0 -0.1
## X34 0.1 0.1 0.2 0.1 0.4 0.1 0.4 0.0 0.5 -0.1 1.0
# optional: visualise the correlation matrix
# install.packages('d3heatmap') d3heatmap::d3heatmap(cor(x))
cormat = cor(x) - diag(rep(1, 33))
table(cormat > 0.8)
##
## FALSE TRUE
## 1087 2
which(cormat > 0.8)
## [1] 377 441
cormat[377]
## [1] 0.8255584
cor(x[, 12], x[, 14])
## [1] 0.8255584
# alternatively cordf = as.data.frame(as.table(cormat))
# subset(cordf, abs(Freq) > 0.8)
Fit a logistic lasso regression and comment on the lasso coefficient plot (showing \(\log(\lambda)\) on the x-axis and showing labels for the variables).
library(glmnet)
lasso.fit = glmnet(x, y, family = "binomial")
plot(lasso.fit, xvar = "lambda", label = TRUE)
Use cv.glmnet()
to get the two default choices lambda.min
and lambda.1se
for the lasso tuning parameter \(\lambda\) when the classification error is minimised. Calculate these two values on the log-scale as well to relate to the above lasso coefficient plot.
set.seed(1)
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class")
c(lasso.cv$lambda.min, lasso.cv$lambda.1se)
## [1] 0.006614550 0.009596579
round(log(c(lasso.cv$lambda.min, lasso.cv$lambda.1se)), 2)
## [1] -5.02 -4.65
plot(lasso.cv)
Explore how sensitive the lambda.1se
value is to different choices of \(k\) in the \(k\)-fold CV (changing the default option nfolds=10
) as well as to different choices of the random seed.
set.seed(2)
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 10)
round(log(lasso.cv$lambda.1se), 2)
## [1] -4
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 5)
round(log(lasso.cv$lambda.1se), 2)
## [1] -3.72
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 3)
round(log(lasso.cv$lambda.1se), 2)
## [1] -2.32
set.seed(3)
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 10)
round(log(lasso.cv$lambda.1se), 2)
## [1] -5.67
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 5)
round(log(lasso.cv$lambda.1se), 2)
## [1] -4.37
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 3)
round(log(lasso.cv$lambda.1se), 2)
## [1] -2.6
# we could take a more organised approach to assess the
# variability:
set.seed(2)
nreps = 25
lse_nfolds10 = lse_nfolds5 = lse_nfolds3 = rep(NA, nreps)
for (i in 1:nreps) {
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 10)
lse_nfolds10[i] = log(lasso.cv$lambda.1se)
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 5)
lse_nfolds5[i] = log(lasso.cv$lambda.1se)
lasso.cv = cv.glmnet(x, y, family = "binomial", type.measure = "class",
nfolds = 3)
lse_nfolds3[i] = log(lasso.cv$lambda.1se)
}
boxplot(lse_nfolds10, lse_nfolds5, lse_nfolds3, names = c("nfolds=10",
"nfolds=5", "nfolds=3"), ylab = "log(Lambda)")
Choose that \(\lambda\) value that gives smallest AIC and BIC for the models on the lasso path.
To calculate the \(-2\times\text{LogLik}\) for the models on the lasso path use the function deviance.glmnet
(check help(deviance.glmnet)
) and for calculating the number of non-zero regression parameters use the value df
of the glmnet
object.
Show also the corresponding regression coefficients.
length(lasso.fit$lambda)
## [1] 96
lasso.aic = deviance(lasso.fit) + 2 * lasso.fit$df
lasso.bic = deviance(lasso.fit) + log(351) * lasso.fit$df
which.min(lasso.aic)
## [1] 64
which.min(lasso.bic)
## [1] 37
log(lasso.fit$lambda[64])
## [1] -7.251293
log(lasso.fit$lambda[37])
## [1] -4.739382
coef(lasso.fit)[, c(37, 64)]
## 34 x 2 sparse Matrix of class "dgCMatrix"
## s36 s63
## (Intercept) -7.63804244 -19.97760594
## X1 5.83649771 17.33128212
## X3 1.65355461 1.79013288
## X4 . -0.71473590
## X5 1.48511903 3.24988608
## X6 0.92605010 3.17590149
## X7 1.43732303 1.02380148
## X8 1.32579242 3.20339882
## X9 . 2.28104503
## X10 0.45782632 0.68954585
## X11 . -2.99786770
## X12 . -0.83889824
## X13 . .
## X14 . 0.03278883
## X15 . 2.30522823
## X16 . -2.74363799
## X17 . 0.43751730
## X18 0.45803345 2.13217256
## X19 . -3.35870761
## X20 . .
## X21 . .
## X22 -1.43984206 -2.85916411
## X23 0.08460504 2.31187561
## X24 0.12849989 1.30783045
## X25 0.72605376 1.89735684
## X26 . -0.27175033
## X27 -1.29590306 -4.69533343
## X28 . .
## X29 . 1.73637193
## X30 1.09798231 3.67431267
## X31 0.52476560 1.23671045
## X32 . .
## X33 . -0.12089348
## X34 -1.34765320 -3.09040097
Visualise the parameter estimates from the maximum-likelihood (ML), lasso, ridge and elastic-net methods. When obtaining the parameter estimates, use lambda.1se
for each of the three regularization methods. Recall that glm
fits logistic regression models with ML.
dat2 = dat[, -2]
dat2$y = as.numeric(dat2$y) - 1
b.ml = coef(glm(y ~ ., family = binomial(link = "logit"), data = dat2))[-1]
set.seed(4)
cv.glmnet(x, y, family = "binomial", type.measure = "class",
alpha = 0)$lambda.1se
## [1] 0.1005353
b.ridge = coef(glmnet(x, y, family = "binomial", alpha = 0, lambda = 0.1))[-1]
cv.glmnet(x, y, family = "binomial", type.measure = "class",
alpha = 0.5)$lambda.1se
## [1] 0.03681076
b.enet = coef(glmnet(x, y, family = "binomial", alpha = 0.5,
lambda = 0.037))[-1]
cv.glmnet(x, y, family = "binomial", type.measure = "class")$lambda.1se
## [1] 0.04251881
b.lasso = coef(glmnet(x, y, family = "binomial", lambda = 0.043))[-1]
B = cbind(b.ml, b.ridge, b.enet, b.lasso)
plot(1:4, B[1, ], type = "b", lty = 2, col = c(1, 2, 4, 6), ylab = "Coefficients",
xlim = c(0.75, 4.25), ylim = c(-7, 7), xlab = "")
for (i in 2:32) {
points(1:4, B[i, ], type = "b", lty = 2, col = c(1, 2, 4,
6))
}
text(1, 6.2, "ML")
text(2, -3, "Ridge", col = 2)
text(3, 5, "Elastic-net", col = 4)
text(4, -3, "Lasso", col = 6)
It is impossible to know for real data what classifier (regression model) will do best and often it is not even clear what is meant by ‘best’. We are going to use again the classification error, but this time of independent data to assess what is ‘best’ in terms of prediction.
To empirically determine this prediction error, a standard approach is to split the data into a training set (often around 2/3 of the observations) and into a validation set (the remaining observations).
Produce such a random split:
n = length(y)
n
## [1] 351
set.seed(2015)
train = sample(1:n, 2 * n/3) # by default replace=FALSE in sample()
length(train)/n
## [1] 0.6666667
max(table(train)) # if 1, then no ties in the sampled observations
## [1] 1
test = (1:n)[-train]
lambda.min
from cv.glmnet
.lasso.cv = cv.glmnet(x[train, ], y[train], family = "binomial",
type.measure = "class")
lasso.fit = glmnet(x[train, ], y[train], family = "binomial")
# calculate predicted probability to be 'g' - second level of
# categorical y - through type='response'
lasso.predict.prob = predict(lasso.fit, s = lasso.cv$lambda.min,
newx = x[test, ], type = "response") # check help(predict.glmnet)
predict.g = lasso.predict.prob > 0.5
table(y[test], predict.g)
## predict.g
## FALSE TRUE
## b 36 8
## g 6 67
sum(table(y[test], predict.g)[c(2, 3)])/length(test)
## [1] 0.1196581
# calculate predicted y through type='class'
lasso.predict.y = predict(lasso.fit, s = lasso.cv$lambda.min,
newx = x[test, ], type = "class") # check help(predict.glmnet)
mean(lasso.predict.y != y[test])
## [1] 0.1196581
enet.cv = cv.glmnet(x[train, ], y[train], family = "binomial",
type.measure = "class", alpha = 0.5)
enet.fit = glmnet(x[train, ], y[train], family = "binomial",
alpha = 0.5)
enet.predict.y = predict(enet.fit, s = enet.cv$lambda.min, newx = x[test,
], type = "class")
mean(enet.predict.y != y[test])
## [1] 0.1111111
ridge.cv = cv.glmnet(x[train, ], y[train], family = "binomial",
type.measure = "class", alpha = 0)
ridge.fit = glmnet(x[train, ], y[train], family = "binomial",
alpha = 0)
ridge.predict.y = predict(ridge.fit, s = ridge.cv$lambda.min,
newx = x[test, ], type = "class")
mean(ridge.predict.y != y[test])
## [1] 0.1538462
To explore the performance of the above regularization methods in normal linear regression when \(1000 = p > n = 300\) let us consider a true model \[ \boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad\text{with}\quad \boldsymbol{\epsilon} \sim N(\boldsymbol{0},\boldsymbol{I}) \]
Generate the regression data
set.seed(300)
n = 300
p = 1000
p.nonzero = 10
x = matrix(rnorm(n * p), nrow = n, ncol = p)
b.nonzero = rexp(p.nonzero) * (sign(rnorm(p.nonzero)))
b.nonzero
## [1] -2.3780961 0.3871070 0.7883509 0.1876987 1.3820593
## [6] 0.0685901 2.1213078 0.3013759 -0.4809320 -0.9830124
beta = c(b.nonzero, rep(0, p - p.nonzero))
y = x %*% beta + rnorm(n)
Split the data into a 2/3 training and 1/3 test set as before.
train = sample(1:n, 2 * n/3)
x.train = x[train, ]
x.test = x[-train, ]
y.train = y[train]
y.test = y[-train]
lasso.fit = glmnet(x.train, y.train, family = "gaussian", alpha = 1)
ridge.fit = glmnet(x.train, y.train, family = "gaussian", alpha = 0)
enet.fit = glmnet(x.train, y.train, family = "gaussian", alpha = 0.5)
mse
(mean squared error) from cv.glmnet
for 10-fold CV. Plot the solution paths and cross-validated MSE as function of \(\lambda\).fit = list()
for (i in 0:10) {
fit[[i + 1]] = cv.glmnet(x.train, y.train, type.measure = "mse",
alpha = i/10, family = "gaussian")
}
names(fit) = 0:10
par(mfrow = c(3, 2))
plot(lasso.fit, xvar = "lambda")
plot(fit[["10"]], main = "lasso")
plot(ridge.fit, xvar = "lambda")
plot(fit[["5"]], main = "elastic-net")
plot(enet.fit, xvar = "lambda")
plot(fit[["0"]], main = "ridge")
yhat0 = predict(fit[["0"]], s = fit[["0"]]$lambda.1se, newx = x.test)
yhat1 = predict(fit[["1"]], s = fit[["1"]]$lambda.1se, newx = x.test)
yhat2 = predict(fit[["2"]], s = fit[["2"]]$lambda.1se, newx = x.test)
yhat3 = predict(fit[["3"]], s = fit[["3"]]$lambda.1se, newx = x.test)
yhat4 = predict(fit[["4"]], s = fit[["4"]]$lambda.1se, newx = x.test)
yhat5 = predict(fit[["5"]], s = fit[["5"]]$lambda.1se, newx = x.test)
yhat6 = predict(fit[["6"]], s = fit[["6"]]$lambda.1se, newx = x.test)
yhat7 = predict(fit[["7"]], s = fit[["7"]]$lambda.1se, newx = x.test)
yhat8 = predict(fit[["8"]], s = fit[["8"]]$lambda.1se, newx = x.test)
yhat9 = predict(fit[["9"]], s = fit[["9"]]$lambda.1se, newx = x.test)
yhat10 = predict(fit[["10"]], s = fit[["10"]]$lambda.1se, newx = x.test)
mse0 = mean((y.test - yhat0)^2)
mse1 = mean((y.test - yhat1)^2)
mse2 = mean((y.test - yhat2)^2)
mse3 = mean((y.test - yhat3)^2)
mse4 = mean((y.test - yhat4)^2)
mse5 = mean((y.test - yhat5)^2)
mse6 = mean((y.test - yhat6)^2)
mse7 = mean((y.test - yhat7)^2)
mse8 = mean((y.test - yhat8)^2)
mse9 = mean((y.test - yhat9)^2)
mse10 = mean((y.test - yhat10)^2)
c(mse0, mse1, mse2, mse3, mse4, mse5, mse6, mse7, mse8, mse9,
mse10)
## [1] 15.445985 3.432268 2.228483 1.929294 1.829045
## [6] 1.739102 1.691122 1.661092 1.646961 1.626372
## [11] 1.619253
Here, lasso has smallest MSE and we note that lasso seems to do well when there are lots of redundant variables.
Generate the regression data
set.seed(300)
n = 300
p = 1000
p.nonzero = n
x = matrix(rnorm(n * p), nrow = n, ncol = p)
b.nonzero = rexp(p.nonzero) * (sign(rnorm(p.nonzero)))
summary(b.nonzero)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.59419 -0.68859 0.01699 0.03362 0.76344 3.70905
beta = c(b.nonzero, rep(0, p - p.nonzero))
y = x %*% beta + rnorm(n)
Split the data into a 2/3 training and 1/3 test set as before.
train = sample(1:n, 2 * n/3)
x.train = x[train, ]
x.test = x[-train, ]
y.train = y[train]
y.test = y[-train]
lasso.fit = glmnet(x.train, y.train, family = "gaussian", alpha = 1)
ridge.fit = glmnet(x.train, y.train, family = "gaussian", alpha = 0)
enet.fit = glmnet(x.train, y.train, family = "gaussian", alpha = 0.5)
mse
(mean squared error) from cv.glmnet
for 10-fold CV.for (i in 0:10) {
assign(paste("fit", i, sep = ""), cv.glmnet(x.train, y.train,
type.measure = "mse", alpha = i/10, family = "gaussian"))
}
par(mfrow = c(3, 2))
plot(lasso.fit, xvar = "lambda")
plot(fit10, main = "lasso")
plot(ridge.fit, xvar = "lambda")
plot(fit5, main = "elastic-net")
plot(enet.fit, xvar = "lambda")
plot(fit0, main = "ridge")
yhat0 = predict(fit0, s = fit0$lambda.1se, newx = x.test)
yhat1 = predict(fit1, s = fit1$lambda.1se, newx = x.test)
yhat2 = predict(fit2, s = fit2$lambda.1se, newx = x.test)
yhat3 = predict(fit3, s = fit3$lambda.1se, newx = x.test)
yhat4 = predict(fit4, s = fit4$lambda.1se, newx = x.test)
yhat5 = predict(fit5, s = fit5$lambda.1se, newx = x.test)
yhat6 = predict(fit6, s = fit6$lambda.1se, newx = x.test)
yhat7 = predict(fit7, s = fit7$lambda.1se, newx = x.test)
yhat8 = predict(fit8, s = fit8$lambda.1se, newx = x.test)
yhat9 = predict(fit9, s = fit9$lambda.1se, newx = x.test)
yhat10 = predict(fit10, s = fit10$lambda.1se, newx = x.test)
mse0 = mean((y.test - yhat0)^2)
mse1 = mean((y.test - yhat1)^2)
mse2 = mean((y.test - yhat2)^2)
mse3 = mean((y.test - yhat3)^2)
mse4 = mean((y.test - yhat4)^2)
mse5 = mean((y.test - yhat5)^2)
mse6 = mean((y.test - yhat6)^2)
mse7 = mean((y.test - yhat7)^2)
mse8 = mean((y.test - yhat8)^2)
mse9 = mean((y.test - yhat9)^2)
mse10 = mean((y.test - yhat10)^2)
c(mse0, mse1, mse2, mse3, mse4, mse5, mse6, mse7, mse8, mse9,
mse10)
## [1] 371.7866 328.3805 310.3128 349.3736 316.7003 332.5072
## [7] 319.3968 343.8416 304.7279 309.7283 315.9038
Here, elastic-net has smallest MSE but we note that the MSE is much more variable (as \(\alpha\) is varied).
Investigate effect of correlated columns in \(\boldsymbol{X}\), for example \(\text{Cov}(\boldsymbol{X})_{ij}=(0.9)^{|i−j|}\).
n = 100
p = 60
CovMatrix = outer(1:p, 1:p, function(x, y) {
0.9^abs(x - y)
})
x = MASS::mvrnorm(n, rep(0, p), CovMatrix)
Similar setupas above but different ways to generate non-zero regression coefficients