Lecture A

# Lecture A
## Exhaustive and non-exhaustive algorithms without resampling
### Samuel Müller and Garth Tarr

---

- .red[Necessity] to select .red.bold[models] is great despite philosophical issues 
- Myriad model/variable selection methods based on .blue[loglikelihood], .red[RSS], prediction errors, .orange[resampling], etc

]

]

---

## Some motivating questions

- How to .bold[.red[practically select model(s)]]?
- How to cope with an ever .bold[.red[increasing number of observed variables]]?
- How to .bold[.red[visualise]] the model building process?
- How to assess the .bold[.red[stability]] of a selected model?

---

## Topics

A1. .blue[Selecting models]
* .red[Stepwise model selection]
* .red[Information criteria including AIC and BIC]
* .red[Exhaustive and non-exhaustive searches]

A2 .blue[Regularisation methods]

A3 .blue[Marginality constraints]

---
class: segue

# A1. Selecting models

---
class: eg

## Body fat data

- `$N=252$` measurements of men (available in R package `mfp`)
- .red[Training sample] size `$n=128$` (available in R package `mplot`)
- .red[Response variable]: body fat percentage: `Bodyfat`
- 13 or 14 .red[explanatory variables]

```r
data("bodyfat", package = "mplot")
dim(bodyfat)
```

```
## [1] 128  15
```

```r
names(bodyfat)
```

```
##  [1] "Id"      "Bodyfat" "Age"     "Weight"  "Height"  "Neck"   
##  [7] "Chest"   "Abdo"    "Hip"     "Thigh"   "Knee"    "Ankle"  
## [13] "Bic"     "Fore"    "Wrist"
```

---
class: eg

## Body fat: Boxplots

```r
bfat = bodyfat[,-1] # remove the ID column
boxplot(bfat, horizontal = TRUE, las = 1)
```

---
class: eg

```r
pairs(bfat)
```

---
class: eg

## Body fat data: Full model and power set

```r
names(bfat)
```

```
##  [1] "Bodyfat" "Age"     "Weight"  "Height"  "Neck"    "Chest"  
##  [7] "Abdo"    "Hip"     "Thigh"   "Knee"    "Ankle"   "Bic"    
## [13] "Fore"    "Wrist"
```

.bold[.red[Full model]]
`$$\begin{align*}
\text{Bodyfat} = & \beta_1 + \beta_2\text{Age} + \beta_3\text{Weight}+ \beta_4\text{Height} + \beta_5\text{Height} + {} \\ &  \beta_6\text{Chest} + \beta_7\text{Abdo} + \beta_8\text{Hip} + \beta_9\text{Thigh} + \beta_{10}\text{Knee} +  {} \\ & \beta_{11}\text{Ankle} + \beta_{12}\text{Bic} + \beta_{13}\text{Fore} + \beta_{14}\text{Wrist} + \epsilon
\end{align*}$$`
.bold[.red[Power set]]

Here, the intercept is not subject to selection. Therefore, `$p-1 = 13$` and thus the number of possible regression models is:
`$$M = \#\mathcal{A} = 2^{13} = 8192$$`

---
class: eg code90

## Body fat: Best bivariate fitted model

```r
M0 = lm(Bodyfat ~ Abdo , data = bfat)
summary(M0)
```

```
## 
## Call:
## lm(formula = Bodyfat ~ Abdo, data = bfat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3542  -2.9928   0.2191   2.4967  10.0106 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -43.19058    3.63431  -11.88   <2e-16 ***
## Abdo          0.67411    0.03907   17.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.249 on 126 degrees of freedom
## Multiple R-squared:  0.7027,	Adjusted R-squared:  0.7003 
## F-statistic: 297.8 on 1 and 126 DF,  p-value: < 2.2e-16
```

---
class: eg code80

## Body fat: Full fitted model

```r
M1 = lm(Bodyfat ~ ., data = bfat)
summary(M1)
```

```
## 
## Call:
## lm(formula = Bodyfat ~ ., data = bfat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3767 -2.5514 -0.1723  2.6391  9.1393 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -52.553646  40.062856  -1.312   0.1922    
## Age           0.009288   0.043470   0.214   0.8312    
## Weight       -0.271016   0.243569  -1.113   0.2682    
## Height        0.258388   0.320810   0.805   0.4223    
## Neck         -0.592669   0.322125  -1.840   0.0684 .  
## Chest         0.090883   0.164738   0.552   0.5822    
## Abdo          0.995184   0.123072   8.086 7.29e-13 ***
## Hip          -0.141981   0.204533  -0.694   0.4890    
## Thigh         0.101272   0.200714   0.505   0.6148    
## Knee         -0.096682   0.325889  -0.297   0.7673    
## Ankle        -0.048017   0.507695  -0.095   0.9248    
## Bic           0.075332   0.244105   0.309   0.7582    
## Fore          0.412107   0.272144   1.514   0.1327    
## Wrist        -0.263067   0.745145  -0.353   0.7247    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.081 on 114 degrees of freedom
## Multiple R-squared:  0.7519,	Adjusted R-squared:  0.7236 
## F-statistic: 26.57 on 13 and 114 DF,  p-value: < 2.2e-16
```
]

---

## The AIC and BIC

-  AIC is the most widely known and used model selection method (of course this does not imply it is the best/recommended method to use)

-  .red[AIC] Akaike (1973):
`$$\color{red}{\text{AIC}} = -2\times \text{LogLik} + \color{red}{2} \times p$$`

-  .blue[BIC] Schwarz (1978):
`$$\color{blue}{\text{BIC}}= -2 \times \text{LogLik} + \color{blue}{\log(n)}\times p$$`

-  .red[The smaller the AIC/BIC the better the model]

- These and many other criteria choose models by minimizing an expression that can be written as
`$$\text{Loss + Penalty}$$`

---

## Stepwise variable selection

Stepwise, .green[forward] and .blue[backward]:

1.  Start with .red[.bold[some]] model, typically .green[null model] (with no explanatory
variables) or .blue[full model] (with all variables)
2.  For each variable in the current model, .red[.bold[investigate]] effect of
.red[.bold[removing]] it
3.  .red[.bold[Remove]] the least informative variable, unless this variable is
nonetheless supplying significant information about the response
4.  For each variable not in the current model, .red[.bold[investigate]] effect
of .red[.bold[including]] it
5.  .red[.bold[Include]] the most statistically significant variable not currently
in model (unless no significant variable exists)
6.  Go to step 2. Stop only if no change in steps 2-5

---
class: eg

## Body fat: Forward search using BIC

```r
M0 = lm(Bodyfat ~ 1, data = bfat)  # Null model
M1 = lm(Bodyfat ~ ., data = bfat)  # Full model
step.fwd.bic = step(M0, scope = list(lower = M0, upper = M1),
                    direction = "forward",
                    trace = FALSE, 
                    k = log(128)) # BIC: k=log(n)
# summary(step.fwd.bic)
round(summary(step.fwd.bic)$coef, 3)
```

```
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -47.991      3.691 -13.003        0
## Abdo           0.939      0.080  11.743        0
## Weight        -0.243      0.065  -3.740        0
```

- Best model has two features: Abdo and Weight
- Backward search using BIC gives the same model (confirm yourself)

---
class: eg

## Body fat: Backward search using AIC

```r
M0 = lm(Bodyfat ~ 1, data = bfat)  # Null model
M1 = lm(Bodyfat ~ ., data = bfat)  # Full model
step.bwd.aic = step(M1, scope = list(lower = M0, upper = M1),
                    direction = "backward",
                    trace = FALSE, 
                    k = 2) 
round(summary(step.bwd.aic)$coef, 3)
```

```
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -41.883      8.310  -5.040    0.000
## Weight        -0.229      0.079  -2.915    0.004
## Neck          -0.619      0.266  -2.325    0.022
## Abdo           0.980      0.081  12.036    0.000
## Fore           0.434      0.238   1.821    0.071
```
- Best model has four variables
- Using AIC gives larger model because `$2 < \log(128) = 4.85$`
- `k=2` is the default in `step()`

---
class: eg

## Body fat: Stepwise using AIC

```r
M0 = lm(Bodyfat ~ 1, data = bfat)  # Null model
M1 = lm(Bodyfat ~ ., data = bfat)  # Full model
step.aic = step(M1, scope = list(lower = M0, upper = M1),
                trace = FALSE)
round(summary(step.aic)$coef, 3)
```

```
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -41.883      8.310  -5.040    0.000
## Weight        -0.229      0.079  -2.915    0.004
## Neck          -0.619      0.266  -2.325    0.022
## Abdo           0.980      0.081  12.036    0.000
## Fore           0.434      0.238   1.821    0.071
```
- Here, default stepwise gives the same as backward using AIC
- This does not hold in general

---

## Criticism of stepwise procedures

-  .red[.bold[Never run (p value based) automatic stepwise procedures on their own!]]

.blockquote[
Stepwise regression is probably the most abused computerized statistical technique ever devised. If you think you need stepwise regression to solve a particular problem you have, it is almost certain you do not. Professional statisticians rarely use automated stepwise regression.

-  See Wiegand (2010) for a more recent simulation study and review of the performance of stepwise procedures

---

## Exhaustive search

.blockquote[
**Exhaustive search** is the only technique **guaranteed** to find the predictor variable subset with the best evaluation criterion.
]

- Since we look over the whole model space, we can identify the best model(s) at each model size
- Sometimes known as .red[.bold[best subsets]] model selection
- .red[.bold[Loss]] component is (typically) the residual sum of squares
- Main drawback: exhaustive searching is .red[.bold[computationally intensive]]

---

## The `lmSubsets` package

- Main function: `lmSubsets()` performs an .red[.bold[exhaustive search]] of the model space to find all-subsets selection for predicting `y` in .red[.bold[linear regression]] (Hofmann, Gatu, Kontoghiorghes, Colubi, and Zeileis, 2020)
- Uses an efficient .red[.bold[branch-and-bound]] algorithm
- The `lmSelect()` function performs best-subset selection

.footnote[Historical note: the `leaps` package also performs regression subset selection including .red[.bold[exhaustive searches]] (Lumley and Miller, 2009) for _linear models_, the `lmSubsets` package seems to be faster at achieving a similar goal and offers a few more features.]

---

## The `lmSubsets` function

- Input: .red[.bold[full model]] and various optional parameters (e.g. `include` and `exclude` parameters to force variables in or out of the model space; `nmin` and `nmax` to specify the smallest and largest model space; and `nbest` for the number of best subsets to report in each model size).
- Output: a `lmSubsets` object but you interact with it using the `summary` function which reports model summary statistics (e.g. `$\hat{\sigma}$`, `$R^2$`, AIC, BIC)
- Plot: there's also a plot method which can be applied to the `lmSubsets` object

---

## The `lmSelect` function

The `lmSelect` function is similar to the `lmSubsets` function but it helps the user choose a "best" model.

- Input: .red[.bold[full model]] and various optional parameters including the `penalty` parameter
  - `penalty = "BIC"` is the default
  - `penalty = 2` or `penalty = "AIC"` gives AIC
  - or a more general function `penalty = function (size, rss)` where where `size` is the number of regressors, and `rss` the residual sum of squares of the corresponding submodel
- Output: a `lmSelect` object. In particular, the list has elements such as `subset` (identifies the selected variables)

The `refit()` can be used to fit the selected model and return an `lm` object.

---
class: eg code75

## Body fat: Exhaustive search

```r
library(lmSubsets)
rgsbst.out = lmSubsets(Bodyfat ~ ., data = bfat, 
                        nbest = 1, # 1 best model for each dimension
                        nmax = NULL, # NULL for no limit on number of variables
                        include = NULL, exclude = NULL)
rgsbst.out
```

```
## Call:
##   lmSubsets(formula = Bodyfat ~ ., data = bfat, nbest = 1, nmax = NULL, 
##     include = NULL, exclude = NULL)
## 
## Deviance:
##   [best, size (tolerance)] = RSS
##       2 (0)    3 (0)    4 (0)    5 (0)    6 (0)   7 (0)    8 (0)   
##   1st 2274.919 2045.958 1975.423 1939.099 1922.31 1914.939 1912.117
##       9 (0)    10 (0)   11 (0)   12 (0)  13 (0)   14 (0)  
##   1st 1905.726 1902.934 1900.909 1899.58 1898.741 1898.592
## 
## Subset:
##   [variable, best] = size
##                1st      
##   +(Intercept) 2-14     
##   Age          13-14    
##   Weight       3,5-14   
##   Height       8-14     
##   Neck         4-14     
##   Chest        9-14     
##   Abdo         2-14     
##   Hip          4,6-14   
##   Thigh        9-14     
##   Knee         12-14    
##   Ankle        14       
##   Bic          11-14    
##   Fore         5-14     
##   Wrist        7-8,10-14
```
]

---
class: eg code80

## Body fat: Exhaustive search

```r
image(rgsbst.out)
```

---
class: eg

## Body fat: Exhaustive search

```r
summary.out = summary(rgsbst.out)
names(summary.out)
```

```
## [1] "call"  "terms" "nvar"  "nbest" "size"  "stats"
```

```r
round(summary.out$stats, 2)
```

```
##    SIZE BEST sigma   R2 R2adj pval    Cp    AIC    BIC
## 2     2    1  4.25 0.70  0.70    0 12.60 737.59 746.15
## 3     3    1  4.05 0.73  0.73    0  0.85 726.01 737.42
## 4     4    1  3.99 0.74  0.74    0 -1.39 723.52 737.78
## 5     5    1  3.97 0.75  0.74    0 -1.57 723.15 740.26
## 6     6    1  3.97 0.75  0.74    0 -0.58 724.03 744.00
## 7     7    1  3.98 0.75  0.74    0  0.98 725.54 748.36
## 8     8    1  3.99 0.75  0.74    0  2.81 727.35 753.02
## 9     9    1  4.00 0.75  0.73    0  4.43 728.92 757.44
## 10   10    1  4.02 0.75  0.73    0  6.26 730.74 762.11
## 11   11    1  4.03 0.75  0.73    0  8.14 732.60 766.82
## 12   12    1  4.05 0.75  0.73    0 10.06 734.51 771.59
## 13   13    1  4.06 0.75  0.73    0 12.01 736.45 776.38
## 14   14    1  4.08 0.75  0.72    0 14.00 738.44 781.22
```

---
class: eg code80

## Body fat: Exhaustive search

```r
plot(rgsbst.out)
```

---
class: eg code80

## Body fat: Exhaustive search with selection

```r
best_bic = lmSelect(Bodyfat ~ ., data = bfat)
best_aic = lmSelect(rgsbst.out, penalty = "AIC")
rbind(best_bic$subset,
      best_aic$subset)*1
```

```
##   (Intercept) Age Weight Height Neck Chest Abdo Hip Thigh Knee Ankle Bic Fore Wrist
## 1           1   0      1      0    0     0    1   0     0    0     0   0    0     0
## 2           1   0      1      0    1     0    1   0     0    0     0   0    1     0
```

---
class: eg

## Body fat: Best subsets with BIC

```r
best_10_bic = lmSelect(Bodyfat ~ ., data = bfat, nbest = 10)
```
.pull-left[

```r
plot(best_10_bic)
```

<img src="lecA_files/figure-html/unnamed-chunk-16-1.png" width="864" />
]
.pull-right[

```r
image(best_10_bic)
```

<img src="lecA_files/figure-html/unnamed-chunk-17-1.png" width="864" />
]

---
class: segue

# A2. Regularisation methods

---

## Regularisation methods with `glmnet`

.blockquote[
Regularisation methods **shrink** estimated regression coefficients by imposing a penalty on their sizes
]

- There are different choices for the .blue[.bold[penalty]]
- The penalty choice drives the properties of the method

In particular, `glmnet` implements

- .blue[.bold[Lasso regression]] using a `$L_1$` penalty
- .red[.bold[Ridge regression]] using a `$L_2$` penalty
- .orange[.bold[Elastic-net]] using both, an `$L_1$` and an `$L_2$` penalty
- **Adaptive lasso**, it takes advantage of feature weights

---

## Lasso is least squares with `$L_1$` penalty

- Simultaneously estimate and select coefficients through `$$\hat{\boldsymbol{\beta}}_{\text{lasso}} = \hat{\boldsymbol{\beta}}_{\text{lasso}}({\color{red}\lambda}) = \operatorname{argmin}_{\boldsymbol{\beta} \in \Bbb{R}^p} \{ (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})^T (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}) + {\color{red}\lambda} \, \color{blue}{||\boldsymbol{\beta}||_1}\}$$`
- positive <span style="color:red">tuning parameter `$\lambda$`</span> controls the shrinkage
- Optimisation problem above is equivalent to solving
`$$\text{minimise } \operatorname{RSS}(\boldsymbol{\beta}) = (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})^T (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}) \text{ subject to } {\color{blue}{\sum_{j=1}^p |\beta_j|}} \leq t.$$`

---
class: eg

## Example: Diabetes

- We consider the `diabetes` data and we only analyse main effects
- The data from the `lars` package is the same as the `mplot` package but is structured and scaled differently

```r
data("diabetes", package = "lars")
names(diabetes) # list, slot x has 10 main effects
```

```
## [1] "x"  "y"  "x2"
```

```r
dim(diabetes)
```

```
## [1] 442   3
```

```r
x = diabetes$x
dim(x)
```

```
## [1] 442  10
```

```r
y = diabetes$y
```

---
class: eg

## Example: Diabetes

```r
class(x) = NULL # remove the AsIs class; otherwise single boxplot drawn
boxplot(x, cex.axis = 2)
```

---
class: eg code90

## Example: Diabetes

```r
lm.fit = lm(y~x) #least squares fit for the full model first
summary(lm.fit)
```

```
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -155.829  -38.534   -0.227   37.806  151.355 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  152.133      2.576  59.061  < 2e-16 ***
## xage         -10.012     59.749  -0.168 0.867000    
## xsex        -239.819     61.222  -3.917 0.000104 ***
## xbmi         519.840     66.534   7.813 4.30e-14 ***
## xmap         324.390     65.422   4.958 1.02e-06 ***
## xtc         -792.184    416.684  -1.901 0.057947 .  
## xldl         476.746    339.035   1.406 0.160389    
## xhdl         101.045    212.533   0.475 0.634721    
## xtch         177.064    161.476   1.097 0.273456    
## xltg         751.279    171.902   4.370 1.56e-05 ***
## xglu          67.625     65.984   1.025 0.305998    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.15 on 431 degrees of freedom
## Multiple R-squared:  0.5177,	Adjusted R-squared:  0.5066 
## F-statistic: 46.27 on 10 and 431 DF,  p-value: < 2.2e-16
```
]

---

## Fitting a lasso regression
Lasso regression with the `glmnet()` function in `library(glmnet)` has as a first argument the design matrix and as a second argument the response vector, i.e.

```r
library(glmnet)
lasso.fit = glmnet(x, y)
```

---

### Comments
- `glmnet.fit` is an object of class glmnet that contains all the relevant information of the fitted model for further use 
- Various methods are provided for the object such as `coef`, `plot`, `predict` and `print` that extract and present useful object information
- The additional material shows that cross-validation can be used to choose a good value for `$\lambda$` through using the function `cv.glmnet`

```r
set.seed(1) # to get a reproducible lambda.min value
lasso.cv = cv.glmnet(x, y)
lasso.cv$lambda.min
```

```
## [1] 0.09729434
```

---
class: eg

## Lasso regression coefficients
Clearly the regression coefficients are smaller for the .blue[lasso regression] than for .red[least squares]

```r
lasso = as.matrix(coef(lasso.fit, s = c(5, 1, 0.5, 0)))
LS = coef(lm.fit)
data.frame(lasso = lasso, LS)
```

<table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Variable </th>
   <th style="text-align:right;"> lasso.1 </th>
   <th style="text-align:right;"> lasso.2 </th>
   <th style="text-align:right;"> lasso.3 </th>
   <th style="text-align:right;"> lasso.4 </th>
   <th style="text-align:right;"> LS </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;color: blue !important;"> 152.1 </td>
   <td style="text-align:right;color: blue !important;"> 152.1 </td>
   <td style="text-align:right;color: blue !important;"> 152.1 </td>
   <td style="text-align:right;color: blue !important;"> 152.1 </td>
   <td style="text-align:right;color: red !important;"> 152.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> -9.2 </td>
   <td style="text-align:right;color: red !important;"> -10.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> sex </td>
   <td style="text-align:right;color: blue !important;"> -45.3 </td>
   <td style="text-align:right;color: blue !important;"> -195.9 </td>
   <td style="text-align:right;color: blue !important;"> -216.4 </td>
   <td style="text-align:right;color: blue !important;"> -239.1 </td>
   <td style="text-align:right;color: red !important;"> -239.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> bmi </td>
   <td style="text-align:right;color: blue !important;"> 509.1 </td>
   <td style="text-align:right;color: blue !important;"> 522.1 </td>
   <td style="text-align:right;color: blue !important;"> 525.1 </td>
   <td style="text-align:right;color: blue !important;"> 520.5 </td>
   <td style="text-align:right;color: red !important;"> 519.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> map </td>
   <td style="text-align:right;color: blue !important;"> 217.2 </td>
   <td style="text-align:right;color: blue !important;"> 296.2 </td>
   <td style="text-align:right;color: blue !important;"> 308.3 </td>
   <td style="text-align:right;color: blue !important;"> 323.6 </td>
   <td style="text-align:right;color: red !important;"> 324.4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tc </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> -101.9 </td>
   <td style="text-align:right;color: blue !important;"> -161.0 </td>
   <td style="text-align:right;color: blue !important;"> -716.5 </td>
   <td style="text-align:right;color: red !important;"> -792.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ldl </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 418.4 </td>
   <td style="text-align:right;color: red !important;"> 476.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> hdl </td>
   <td style="text-align:right;color: blue !important;"> -147.7 </td>
   <td style="text-align:right;color: blue !important;"> -223.2 </td>
   <td style="text-align:right;color: blue !important;"> -180.5 </td>
   <td style="text-align:right;color: blue !important;"> 65.4 </td>
   <td style="text-align:right;color: red !important;"> 101.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tch </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 66.1 </td>
   <td style="text-align:right;color: blue !important;"> 164.6 </td>
   <td style="text-align:right;color: red !important;"> 177.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ltg </td>
   <td style="text-align:right;color: blue !important;"> 446.3 </td>
   <td style="text-align:right;color: blue !important;"> 513.6 </td>
   <td style="text-align:right;color: blue !important;"> 524.2 </td>
   <td style="text-align:right;color: blue !important;"> 723.7 </td>
   <td style="text-align:right;color: red !important;"> 751.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> glu </td>
   <td style="text-align:right;color: blue !important;"> 0.0 </td>
   <td style="text-align:right;color: blue !important;"> 53.9 </td>
   <td style="text-align:right;color: blue !important;"> 61.2 </td>
   <td style="text-align:right;color: blue !important;"> 67.5 </td>
   <td style="text-align:right;color: red !important;"> 67.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Sum of abs(coef) </td>
   <td style="text-align:right;color: blue !important;font-weight: bold;background-color: yellow !important;"> 1365.7 </td>
   <td style="text-align:right;color: blue !important;font-weight: bold;background-color: yellow !important;"> 1906.7 </td>
   <td style="text-align:right;color: blue !important;font-weight: bold;background-color: yellow !important;"> 2042.6 </td>
   <td style="text-align:right;color: blue !important;font-weight: bold;background-color: yellow !important;"> 3248.5 </td>
   <td style="text-align:right;color: red !important;font-weight: bold;background-color: yellow !important;"> 3460.0 </td>
  </tr>
</tbody>
</table>

---

## Lasso regression vs least squares

```r
beta_lasso_5 = coef(lasso.fit, s = 5)
sum(abs(beta_lasso_5[-1]))
```

```
## [1] 1365.672
```

```r
beta_lasso_1 = coef(lasso.fit, s = 1)
sum(abs(beta_lasso_1[-1]))
```

```
## [1] 1906.658
```

```r
beta_ls = coef(lm.fit)
sum(abs(beta_ls[-1]))
```

```
## [1] 3460.005
```

---
class: eg

## Lasso coefficient plot

```r
plot(lasso.fit, label=TRUE, cex.axis = 1.5, cex.lab = 1.5)
```

---

## We observe that the lasso

- Can shrink coefficients exactly to zero
- Simultaneously estimates and selects coefficients
- Paths are not necessarily monotone (e.g. coefficients for `hdl` in above table for different values of `s`)

---

## Further remarks on lasso

- .blue[.bold[lasso]] is an acronym for **l**east **a**bsolute **s**election and **s**hrinkage **o**perator
- Theoretically, when the tuning parameter `$\lambda=0$`, there is no bias and `$\hat{\boldsymbol{\beta}}_{\text{lasso}} = \hat{\boldsymbol{\beta}}_{\text{LS}}$` ( `$p\leq n$` ) however, `glmnet` *approximates* the solution
- When `$\lambda=\infty$`, we get `$\hat{\boldsymbol{\beta}}_{\text{lasso}} = 0$` `$\Rightarrow$` an estimator with zero variance
- For `$\lambda$` in between, we are balancing bias and variance 
- As `$\lambda$` increases, more coefficients are set to zero (less variables are selected), and among the nonzero coefficients, more shrinkage is employed

---

## Why does the lasso give zero coefficients?

.footnote[
This figure is taken from James, Witten, Hastie, and Tibshirani (2013) "An Introduction to Statistical Learning, with applications in R" with permission from the authors.
]

---

## Ridge is least squares with `$L_2$` penalty

- As the lasso, .red[ridge regression] shrinks the estimated regression
coefficients by imposing a penalty on their sizes
`$$\text{minimise } \operatorname{RSS}(\boldsymbol{\beta}) = (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})^T (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}) \text{ subject to } {\color{red}{\sum_{j=1}^p \beta_j^2}} \leq t,$$`
where the positive tuning parameter `$\lambda$` controls the shrinkage
- However, unlike the lasso, .red[ridge regression does not shrink the parameters all the way to zero]
- Ridge coefficients have closed form solution (even when `$n < p$`)
$$
\hat{\boldsymbol{\beta}}_{\text{ridge}}(\lambda) = (\boldsymbol{X}^T\boldsymbol{X}+\lambda\boldsymbol{I})^{-1}\boldsymbol{X}^T\boldsymbol{y}
$$

---

## Fitting a ridge regression

Ridge regression with the `glmnet()` function by changing one of its default options to `alpha=0`

```r
ridge.fit = glmnet(x, y, alpha = 0)
ridge.cv = cv.glmnet(x, y, alpha = 0)
ridge.cv$lambda.min
```

```
## [1] 4.516003
```

### Comments
- In general, the optimal `$\lambda$` for ridge is very different than the optimal `$\lambda$` for the lasso
- The scale of `$\sum\beta_j^2$` and `$\sum|\beta_j|$` is very different
- `$\color{orange}{\alpha}$` is known as the elastic-net mixing parameter

---

## Elastic-net: best of both worlds?

- Minimising the (penalised) log-loglikelihood in linear regression with iid errors is equivalent to minimising the (penalised) RSS
- More generally, the elastic-net minimises the following objective function over `$\color{blue}{\beta_0}$` and `$\color{red}{\boldsymbol{\beta}}$` over a grid of values of `$\lambda$` covering the entire range
`$$\frac{1}{n} \sum_{i=1}^{N} w_i l(y_i,\color{blue}{\beta_0}+\color{red}{\boldsymbol{\beta}}^T \boldsymbol{x}_i) + \lambda\left[(1-\color{orange}{\alpha}) \color{red}{\sum_{j=1}^p\beta_j^2 } + \color{orange}{\alpha} \color{red}{\sum_{j=1}^p|\beta_j| } \right]$$`
- This is general enough to include logistic regression, generalised linear models and Cox regression, where `$l(y_i)$` is the log-likelihood of observation `$i$` and `$w_i$` an optional observation weight

---
class: segue

# A3. Marginality principle

---
## Generalised linear models

### New concepts

-  Logistic regression models
-  Marginality principle for interactions
-  .blue[.bold[Selected R functions]]: 
  - `glm()` and `glmnet()`
  - `AIC(...,k=log(n))`
  - `bestglm()`
  - `hierNet()` incl `hierNet.logistic()`

---
class: eg

## Rock-wallaby data

- Rock-wallaby colony in the Warrambungles National Park (NSW)
- Sample size `$n=200$` sites
- Zero/one responses `$y_i$`'s: presence or absence of scats
- Five main effects and three special interaction terms:
  - .red[.bold[Main effects]]: `$x_2$`, `$x_3$`, `$x_4$`, `$x_5$`, `$x_6$`
  - .red[.bold[Interactions]]:  `$x_7 = x_2\times x_4$`, `$x_8 = x_2\times x_5$`, `$x_9 = x_4\times x_5$`

```r
data("wallabies", package = "mplot")
names(wallabies)
```

```
## [1] "rw"       "edible"   "inedible" "canopy"   "distance"
## [6] "shelter"  "lat"      "long"
```

```r
wdat = data.frame(subset(wallabies, select = -c(lat, long)),
                  EaD = wallabies$edible * wallabies$distance,
                  EaS = wallabies$edible * wallabies$shelter,
                  DaS = wallabies$distance * wallabies$shelter)
```

---
class: eg

## Rock-wallaby: Correlation matrix

```r
round(cor(wdat), 1)
```

```
##            rw edible inedible canopy distance shelter EaD EaS DaS
## rw        1.0    0.3      0.0    0.0     -0.1     0.0 0.2 0.2 0.0
## edible    0.3    1.0      0.0    0.3      0.1     0.0 0.8 0.5 0.0
## inedible  0.0    0.0      1.0    0.1      0.1    -0.1 0.0 0.0 0.0
## canopy    0.0    0.3      0.1    1.0      0.2     0.0 0.3 0.1 0.0
## distance -0.1    0.1      0.1    0.2      1.0     0.1 0.5 0.1 0.5
## shelter   0.0    0.0     -0.1    0.0      0.1     1.0 0.0 0.6 0.8
## EaD       0.2    0.8      0.0    0.3      0.5     0.0 1.0 0.5 0.2
## EaS       0.2    0.5      0.0    0.1      0.1     0.6 0.5 1.0 0.5
## DaS       0.0    0.0      0.0    0.0      0.5     0.8 0.2 0.5 1.0
```

---
class: eg

## Rock-wallaby: Visualised

```r
GGally::ggpairs(wdat, mapping = aes(alpha = 0.2)) + theme_bw(base_size = 16)
```

---

## Logistic regression models

- .red[.bold[Logistic regression model]] is of the form
`$$\text{P}(Y_i = 1 | \mathbf{X}_{\alpha}) = h(\mathbf{x}_{\alpha i}^\top \mathbf{\beta}_{\alpha}),\; i=1,\ldots,n,$$` where `$h(\cdot)$` is the logistic function, `$h(\mu) = \dfrac{1}{1 + e^{-\mu}}$`
- In R, using ML, the full model can be fitted through
  - `glm(y ~ ., family = binomial, data = dat)`
- Calculate .red[.bold[BIC]] value through either,
  - `-2 * logLik(glm(...)) + log(n) * p`
  - `AIC(glm(...), k = log(n))`
- Calculate .red[.bold[AIC]] value through either,
  - `-2 * logLik(glm(...)) + 2 * p`
  - `AIC(glm(...))`

---
class: eg

## Rock-wallaby: Possible models

- Here, `$p_{\alpha_f} = 9$`, intercept part of all models, thus,
- all possible submodels, i.e. `$M = \# \mathcal{A} = 2^{p_{\alpha_f}-1} = 2^8 = 256$`

### Marginality principle

.blockquote[
In general, **never remove the main effect** if a term involving its interaction is included in the model.
]

- I.e. include main effects if .red[.bold[interaction terms]] are modeled 
- This reduces the number of submodels to `$M = 72$`

---
class: eg

## Rock-wallaby: Full model

```r
M1 = glm(rw ~ ., family = binomial, data = wdat)
summary(M1)
```

```
## 
## Call:
## glm(formula = rw ~ ., family = binomial, data = wdat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2762  -1.0810   0.4916   1.0107   1.6596  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.1785976  0.5547714   0.322  0.74751   
## edible       0.1244071  0.0435224   2.858  0.00426 **
## inedible    -0.0035853  0.0060614  -0.591  0.55419   
## canopy      -0.0017489  0.0056838  -0.308  0.75831   
## distance    -0.0073732  0.0068719  -1.073  0.28329   
## shelter     -1.1439199  0.7052026  -1.622  0.10478   
## EaD         -0.0006349  0.0004034  -1.574  0.11546   
## EaS          0.0313602  0.0371986   0.843  0.39920   
## DaS          0.0118275  0.0076204   1.552  0.12064   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 272.12  on 199  degrees of freedom
## Residual deviance: 237.27  on 191  degrees of freedom
## AIC: 255.27
## 
## Number of Fisher Scoring iterations: 5
```
]

---

## The `bestglm` package

- The `bestglm` packages performs complete enumeration for generalised linear models and uses the `leaps` package for linear models.
- The best fit may be found using the information criterion IC: .red[.bold[AIC]], .red[.bold[BIC]].

- It is also able to perform model selection through .red[.bold[cross-validation]] with `IC="CV"` (or the special case `IC="LOOCV"`).

See the [vignette](http://www.et.bs.ehu.es/cran/web/packages/bestglm/vignettes/bestglm.pdf) for further information.

.footnote[
Annoyingly, the vignette doesn't seem to be accessible in the usual way. You can try to access it locally using:

```r
path = system.file("bestglm.pdf", package = "bestglm")
system(paste0('open "', path, '"'))
```
]

<!--

### The BICg information criterion

The `BICg` parameter represents the following adaptation of the BIC:
`$$\text{BIC}_\gamma(\alpha) = -2\text{LogLik}(\alpha) + \log(n) \times p_\alpha + 2\gamma\log{p_{\alpha_f}\choose p_\alpha},$$`
where `$\gamma$` is a tuning parameter, `$p_{\alpha_f}$` is the total number of predictor variables and `$p_\alpha$` is the number of parameters in model `$\alpha$`.

- When `$\gamma=0$` we recover the BIC
- The extra penalisation reduces the tendancy of BIC to select models with too many parameters in large `$p_{\alpha_f}$` problems
- Instead of having a uniform prior over all possible models, we now have a uniform prior over models of fixed size

### The BICq information criterion

The `BICq` parameter represents the following adaptation of the BIC:
`$$\text{BIC}_q(\alpha) = -2\text{LogLik}(\alpha) + \log(n)\times p_\alpha - 2p_\alpha\log \left( \frac{q}{1-q} \right)$$`

- When `$q=1/2$` we recover the BIC
- When `$q = 0$` we select the null model
- When `$q = 1$` we select the full model
- Derived using a Bernoulli prior for the parameters.  Each parameter has a prior probability of `$q$` of being included
- Can be used to favour more or less parsimonious models

## Cross-validation with `bestglm`

.blockquote[
Cross-validation works by finding the best model (for each model size) using a **training set** and then picking model that performs best on the **validation set**.
]

- .red[.bold[Delete-d cross-validation]] was proposed by Shao (1993)
- Random samples of size `$d$` are used as the .red[.bold[validation set]]
- The .red[.bold[training set]] is the complement of the validation set
- When `$d$` increases with `$n$` this method is .red[.bold[consistent]]

### How to choose d?

- When `$d=1$` this is similar to leave one out CV
- When `$d \approx n/k$` this is similar to `$k$`-fold CV

## Cross-validation with `bestglm`

### How to choose d?

.pull-left[
- Shao (1993) recommends using a much larger CV sample than is customarily used in `$k$`-fold CV
- Using Shao's approach, the suggested number of observations to delete is,
`$$d^{*} = n\left( 1 - \frac{1}{\log(n)-1}\right)$$`
]

.pull-right[
`$n$`  | `$d^{*}$` | `$k=10$` | `$k=5$`
---- | ------- | ------ | -----
50   | 33      | 5      | 10
100  | 73      | 10     | 20
200  | 154     | 20     | 40
500  | 405     | 50     | 100
1000 | 831     | 100    | 200
]

## Rock-wallaby: Best CV model

```r
library(bestglm)
X = subset(wdat, select = -rw)
y = wdat$rw
Xy = as.data.frame(cbind(X, y))
bestglm(Xy, IC = "CV", family = binomial)
```

```
## Morgan-Tatar search since family is non-gaussian.
```

```
## CVd(d = 154, REP = 1000)
## BICq equivalent for q in (0.00010264939081539, 0.419147940650539)
## Best Model:
##                Estimate Std. Error   z value     Pr(>|z|)
## (Intercept) -0.43867074 0.22213076 -1.974831 4.828732e-02
## edible       0.06421968 0.01554788  4.130446 3.620598e-05
```

-->

---
class: eg

## Rock-wallaby: Best AIC model

- If the marginality principle is not observed then we search over all `$2^8=256$` models.

```r
library(bestglm)
```

```
## Loading required package: leaps
```

```r
X = subset(wdat, select = -rw)
y = wdat$rw
Xy = as.data.frame(cbind(X, y))
bestglm(Xy, IC = "AIC", family = binomial)
```

```
## Morgan-Tatar search since family is non-gaussian.
```

```
## AIC
## BICq equivalent for q in (0.419147940650539, 0.842477608830578)
## Best Model:
##                  Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -0.5187681581 0.2294865608 -2.260560 0.0237865121
## edible       0.1309190259 0.0355298982  3.684757 0.0002289213
## EaD         -0.0006527505 0.0002864895 -2.278444 0.0227001134
```

---
class: eg

## Rock-wallaby: Best BIC model

```r
bestglm(Xy, IC = "BIC", family = binomial)
```

```
## Morgan-Tatar search since family is non-gaussian.
```

```
## BIC
## BICq equivalent for q in (0.419147940650539, 0.842477608830578)
## Best Model:
##                  Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -0.5187681581 0.2294865608 -2.260560 0.0237865121
## edible       0.1309190259 0.0355298982  3.684757 0.0002289213
## EaD         -0.0006527505 0.0002864895 -2.278444 0.0227001134
```

-  Both the .red[.bold[AIC]] and .red[.bold[BIC]] suggest the optimal model is the one with main effect `edible` and the interaction term `edible*distance`

---

## How do we enforce marginality?

- Hardcode
- .blue[.bold[Hierarchical regularisation]], e.g. with `hierNet()`
  - Let's reframe the problem: consider all main effects and any of the `$\binom{5}{2}$` interaction terms (i.e. no quadratic effects `diagonal=FALSE`)
  - Either weak hierarchy (`strong=FALSE`; `$x_1 + x_1x_2$`, `$x_2 + x_1x_2$` and `$x_1 + x_2 + x_1x_2$` allowed, i.e. at least one of the main effects is present) or strong hierarchy (`strong=TRUE`; only `$x_1 + x_2+x_1x_2$` allowed)

```r
library(hierNet)
Xm = X[, 1:5]  # main effects only
Xm=scale(Xm, center = TRUE, scale = TRUE)
#no quadratic terms and strong marginality
fit = hierNet.logistic.path(Xm, y, diagonal=FALSE, 
                            strong=TRUE) 
set.seed(9)
fitcv=hierNet.cv(fit,Xm,y,trace=0)
```

---

## Rock-wallaby: Strong hierarchical Lasso

```r
fitcv$lamhat.1se
```

```
## [1] 23.15706
```

```r
fit.1se = hierNet.logistic(Xm, y, lam=fitcv$lamhat.1se,
                           diagonal = FALSE, strong = TRUE)
```

```r
print(fit.1se)
```

```
## Call:
## hierNet.logistic(x = Xm, y = y, lam = fitcv$lamhat.1se, diagonal = FALSE, 
##     strong = TRUE)
## 
## Non-zero coefficients:
##   (No interactions in this model)
## 
##   Main effect
## 1       0.137
```

---

## Rock-wallaby: Weak hierarchical Lasso model

```r
fit.5 = hierNet.logistic(Xm, y, lam = 5, 
                           diagonal = FALSE, strong = FALSE)
```

```r
print(fit.5)
```

```
## Call:
## hierNet.logistic(x = Xm, y = y, lam = 5, diagonal = FALSE, strong = FALSE)
## 
## Non-zero coefficients:
##   (Rows are predictors with nonzero main effects)
##   (1st column is main effect)
##   (Next columns are nonzero interactions of row predictor)
##   (Last column indicates whether hierarchy constraint is tight.)
## 
##   Main effect 2       3       4 5      Tight?
## 1 0.7821      0       0       0 0            
## 2 -0.0404     0       -0.0511 0 0      *     
## 3 -0.0618     -0.0511 0       0 0            
## 4 -0.1605     0       0       0 0.0802
```

---
class: font100 columns-2

## References

<a name=bib-Akaike1973></a>[Akaike, H.](#cite-Akaike1973) (1973).
"Information theory and and extension of the maximum likelihood
principle". In: _Second International Symposium on Information Theory_.
Ed. by B. Petrov and F. Csaki. Budapest: Academiai Kiado, pp. 267-281.

<a name=bib-Bien2013></a>[Bien, J, J. Taylor, and R.
Tibshirani](#cite-Bien2013) (2013). "A LASSO for hierarchical
interactions". In: _Ann. Statist._ 41.3, pp. 1111-1141. ISSN:
0090-5364. DOI:
[10.1214/13-AOS1096](https://doi.org/10.1214%2F13-AOS1096). URL:
[https://doi.org/10.1214/13-AOS1096](https://doi.org/10.1214/13-AOS1096).

<a name=bib-Chen2008></a>[Chen, J. and Z. Chen](#cite-Chen2008) (2008).
"Extended Bayesian information criteria for model selection with large
model spaces". In: _Biometrika_ 95.3, pp. 759-771. DOI:
[10.1093/biomet/asn034](https://doi.org/10.1093%2Fbiomet%2Fasn034).

<a name=bib-Hofmann2020></a>[Hofmann, M, C. Gatu, E. Kontoghiorghes, et
al.](#cite-Hofmann2020) (2020). "lmSubsets: Exact Variable-Subset
Selection in Linear Regression for R". In: _Journal of Statistical
Software_ 93.3, pp. 1-21. ISSN: 1548-7660. DOI:
[10.18637/jss.v093.i03](https://doi.org/10.18637%2Fjss.v093.i03).

<a name=bib-James2013></a>[James, G, D. Witten, T. Hastie, et
al.](#cite-James2013) (2013). _An introduction to statistical learning
with applications in R_. New York, NY: Springer. ISBN: 9781461471370.
URL:
[http://www-bcf.usc.edu/~gareth/ISL/](http://www-bcf.usc.edu/~gareth/ISL/).

<a name=bib-Johnson1996></a>[Johnson, R. W.](#cite-Johnson1996) (1996).
"Fitting Percentage of Body Fat to Simple Body Measurements". In:
_Journal of Statistics Education_ 4.1. URL:
[http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html](http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html).

<a name=bib-Lumley2009></a>[Lumley, T. and A. Miller](#cite-Lumley2009)
(2009). _leaps: regression subset selection_. R package version 2.9.
URL:
[http://CRAN.R-project.org/package=leaps](http://CRAN.R-project.org/package=leaps).

<a name=bib-Schwarz1978></a>[Schwarz, G.](#cite-Schwarz1978) (1978).
"Estimating the Dimension of a Model". In: _The Annals of Statistics_
6.2, pp. 461-464. DOI:
[10.1214/aos/1176344136](https://doi.org/10.1214%2Faos%2F1176344136).

<a name=bib-Shao1993></a>[Shao, J.](#cite-Shao1993) (1993). "Linear
Model Selection by Cross-Validation". In: _Journal of the American
Statistical Association_ 88.422, pp. 486-494. DOI:
[10.2307/2290328](https://doi.org/10.2307%2F2290328).

<a name=bib-Tuft2011></a>[Tuft, K. D, M. S. Crowther, K. Connell, et
al.](#cite-Tuft2011) (2011). "Predation risk and competitive
interactions affect foraging of an endangered refuge-dependent
herbivore". In: _Animal Conservation_ 14.4, pp. 447-457. DOI:
[10.1111/j.1469-1795.2011.00446.x](https://doi.org/10.1111%2Fj.1469-1795.2011.00446.x).

<a name=bib-Wiegand2010></a>[Wiegand, R. E.](#cite-Wiegand2010) (2010).
"Performance of using multiple stepwise algorithms for variable
selection". In: _Statistics in Medicine_ 29.15, pp. 1647-1659. DOI:
[10.1002/sim.3943](https://doi.org/10.1002%2Fsim.3943).

<a name=bib-Wilkinson1998></a>[Wilkinson, L.](#cite-Wilkinson1998)
(1998). _SYSTAT 8 statistics manual_. SPSS Inc.

---
class: code50 columns-2

```r
devtools::session_info(include_base = FALSE)
```

```
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.0 (2020-04-24)
##  os       macOS Catalina 10.15.5      
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_AU.UTF-8                 
##  ctype    en_AU.UTF-8                 
##  tz       Australia/Sydney            
##  date     2020-07-10                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────
##  package        * version    date       lib source                               
##  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                       
##  backports        1.1.8      2020-06-17 [1] CRAN (R 4.0.0)                       
##  bestglm        * 0.37.3     2020-03-13 [1] CRAN (R 4.0.0)                       
##  bibtex           0.4.2.2    2020-01-02 [1] CRAN (R 4.0.0)                       
##  blob             1.2.1      2020-01-20 [1] CRAN (R 4.0.0)                       
##  broom            0.5.6      2020-04-20 [1] CRAN (R 4.0.0)                       
##  callr            3.4.3      2020-03-28 [1] CRAN (R 4.0.0)                       
##  cellranger       1.1.0      2016-07-27 [1] CRAN (R 4.0.0)                       
##  cli              2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                       
##  codetools        0.2-16     2018-12-24 [1] CRAN (R 4.0.0)                       
##  colorspace       1.4-1      2019-03-18 [1] CRAN (R 4.0.0)                       
##  crayon           1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                       
##  DBI              1.1.0      2019-12-15 [1] CRAN (R 4.0.0)                       
##  dbplyr           1.4.4      2020-05-27 [1] CRAN (R 4.0.0)                       
##  desc             1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                       
##  devtools         2.3.0      2020-04-10 [1] CRAN (R 4.0.0)                       
##  digest           0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                       
##  doRNG            1.8.2      2020-01-27 [1] CRAN (R 4.0.0)                       
##  dplyr          * 1.0.0      2020-06-01 [1] Github (tidyverse/dplyr@5e3f3ec)     
##  ellipsis         0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                       
##  evaluate         0.14       2019-05-28 [1] CRAN (R 4.0.0)                       
##  fansi            0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                       
##  fastmap          1.0.1      2019-10-08 [1] CRAN (R 4.0.0)                       
##  forcats        * 0.5.0      2020-03-01 [1] CRAN (R 4.0.0)                       
##  foreach          1.5.0      2020-03-30 [1] CRAN (R 4.0.0)                       
##  fs               1.4.1      2020-04-04 [1] CRAN (R 4.0.0)                       
##  generics         0.0.2      2018-11-29 [1] CRAN (R 4.0.0)                       
##  GGally         * 2.0.0      2020-06-06 [1] CRAN (R 4.0.0)                       
##  ggplot2        * 3.3.2      2020-06-19 [1] CRAN (R 4.0.0)                       
##  glmnet         * 4.0-2      2020-06-16 [1] CRAN (R 4.0.0)                       
##  glue             1.4.1      2020-05-13 [1] CRAN (R 4.0.0)                       
##  grpreg           3.3.0      2020-06-10 [1] CRAN (R 4.0.0)                       
##  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.0.0)                       
##  haven            2.3.1      2020-06-01 [1] CRAN (R 4.0.0)                       
##  hierNet        * 1.9        2020-02-05 [1] CRAN (R 4.0.0)                       
##  highr            0.8        2019-03-20 [1] CRAN (R 4.0.0)                       
##  hms              0.5.3      2020-01-08 [1] CRAN (R 4.0.0)                       
##  htmltools        0.5.0      2020-06-16 [1] CRAN (R 4.0.0)                       
##  httpuv           1.5.4      2020-06-06 [1] CRAN (R 4.0.0)                       
##  httr             1.4.1      2019-08-05 [1] CRAN (R 4.0.0)                       
##  iterators        1.0.12     2019-07-26 [1] CRAN (R 4.0.0)                       
##  jsonlite         1.7.0      2020-06-25 [1] CRAN (R 4.0.0)                       
##  kableExtra     * 1.1.0.9000 2020-06-01 [1] Github (haozhu233/kableExtra@9059e4b)
##  knitcitations  * 1.0.10     2019-09-15 [1] CRAN (R 4.0.0)                       
##  knitr          * 1.29       2020-06-23 [1] CRAN (R 4.0.0)                       
##  later            1.1.0.1    2020-06-05 [1] CRAN (R 4.0.0)                       
##  lattice          0.20-41    2020-04-02 [1] CRAN (R 4.0.0)                       
##  leaps          * 3.1        2020-01-16 [1] CRAN (R 4.0.0)                       
##  lifecycle        0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                       
##  lmSubsets      * 0.5-1      2020-05-23 [1] CRAN (R 4.0.0)                       
##  lubridate        1.7.9      2020-06-08 [1] CRAN (R 4.0.0)                       
##  magrittr         1.5        2014-11-22 [1] CRAN (R 4.0.0)                       
##  Matrix         * 1.2-18     2019-11-27 [1] CRAN (R 4.0.0)                       
##  memoise          1.1.0      2017-04-21 [1] CRAN (R 4.0.0)                       
##  mime             0.9        2020-02-04 [1] CRAN (R 4.0.0)                       
##  modelr           0.1.8      2020-05-19 [1] CRAN (R 4.0.0)                       
##  mplot          * 1.0.4      2020-02-15 [1] CRAN (R 4.0.0)                       
##  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.0.0)                       
##  nlme             3.1-148    2020-05-24 [1] CRAN (R 4.0.0)                       
##  pillar           1.4.4      2020-05-05 [1] CRAN (R 4.0.0)                       
##  pkgbuild         1.0.8      2020-05-07 [1] CRAN (R 4.0.0)                       
##  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                       
##  pkgload          1.1.0      2020-05-29 [1] CRAN (R 4.0.0)                       
##  pls              2.7-2      2019-10-01 [1] CRAN (R 4.0.0)                       
##  plyr             1.8.6      2020-03-03 [1] CRAN (R 4.0.0)                       
##  prettyunits      1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                       
##  processx         3.4.2      2020-02-09 [1] CRAN (R 4.0.0)                       
##  promises         1.1.1      2020-06-09 [1] CRAN (R 4.0.0)                       
##  ps               1.3.3      2020-05-08 [1] CRAN (R 4.0.0)                       
##  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                       
##  R6               2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                       
##  RColorBrewer     1.1-2      2014-12-07 [1] CRAN (R 4.0.0)                       
##  Rcpp             1.0.4.6    2020-04-09 [1] CRAN (R 4.0.0)                       
##  readr          * 1.3.1      2018-12-21 [1] CRAN (R 4.0.0)                       
##  readxl           1.3.1      2019-03-13 [1] CRAN (R 4.0.0)                       
##  RefManageR       1.2.12     2019-04-03 [1] CRAN (R 4.0.0)                       
##  remotes          2.1.1      2020-02-15 [1] CRAN (R 4.0.0)                       
##  reprex           0.3.0      2019-05-16 [1] CRAN (R 4.0.0)                       
##  reshape          0.8.8      2018-10-23 [1] CRAN (R 4.0.0)                       
##  rlang            0.4.6      2020-05-02 [1] CRAN (R 4.0.0)                       
##  rmarkdown        2.3        2020-06-18 [1] CRAN (R 4.0.0)                       
##  rngtools         1.5        2020-01-23 [1] CRAN (R 4.0.0)                       
##  rprojroot        1.3-2      2018-01-03 [1] CRAN (R 4.0.0)                       
##  rstudioapi       0.11       2020-02-07 [1] CRAN (R 4.0.0)                       
##  rvest            0.3.5      2019-11-08 [1] CRAN (R 4.0.0)                       
##  scales           1.1.1      2020-05-11 [1] CRAN (R 4.0.0)                       
##  sessioninfo      1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                       
##  shape            1.4.4      2018-02-07 [1] CRAN (R 4.0.0)                       
##  shiny            1.5.0      2020-06-23 [1] CRAN (R 4.0.0)                       
##  shinydashboard   0.7.1      2018-10-17 [1] CRAN (R 4.0.0)                       
##  stringi          1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                       
##  stringr        * 1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                       
##  survival         3.2-3      2020-06-13 [1] CRAN (R 4.0.0)                       
##  testthat         2.3.2      2020-03-02 [1] CRAN (R 4.0.0)                       
##  tibble         * 3.0.1      2020-04-20 [1] CRAN (R 4.0.0)                       
##  tidyr          * 1.1.0      2020-05-20 [1] CRAN (R 4.0.0)                       
##  tidyselect       1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                       
##  tidyverse      * 1.3.0      2019-11-21 [1] CRAN (R 4.0.0)                       
##  usethis          1.6.1      2020-04-29 [1] CRAN (R 4.0.0)                       
##  vctrs            0.3.1      2020-06-05 [1] CRAN (R 4.0.0)                       
##  viridisLite      0.3.0      2018-02-01 [1] CRAN (R 4.0.0)                       
##  webshot          0.5.2      2019-11-22 [1] CRAN (R 4.0.0)                       
##  withr            2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                       
##  xaringan         0.16.1     2020-05-16 [1] Github (yihui/xaringan@b621fc5)      
##  xfun             0.15       2020-06-21 [1] CRAN (R 4.0.0)                       
##  xml2             1.3.2      2020-04-23 [1] CRAN (R 4.0.0)                       
##  xtable           1.8-4      2019-04-21 [1] CRAN (R 4.0.0)                       
##  yaml             2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                       
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
```