Please note this website will be under active development until Monday 20 July, there will be additions and minor changes to the material.
Samuel Mueller is a Professor of Statistics and Interim Head of School at the University of Sydney, Fellow of the American Statistical Association and has 18 years experience as a mathematical statistician renowned for his contributions in model selection, classification and prediction for statistically challenging data. He currently leads two research groups on Theoretical Statistical Model Selection at the ANU (with Prof Welsh) and on Fast and Interactive Methods for Complex High-Dimensional Data at USyd. He was appointed by the Australian Research Council on their College of Experts for 2019-2021 and is an Editor (Theory & Methods) of the Australian and New Zealand Journal of Statistics. He has also held various offices in the IBS, currently serving a four year term as a Council Member.
Garth Tarr is a Senior Lecturer and Associate Head (Education) in the School of Mathematics and Statistics at the University of Sydney. He has received more than A$4M in competitive grant funding and a number of citations for his teaching, including a Vice-Chancellor’s Award for Teaching Excellence in 2016. He received his PhD in Mathematical Statistics from the University of Sydney and has held positions at the University of Newcastle and the Australian National University. His diverse interests include robust statistics, data visualisation, model selection, educational research, meat science and biostatistics. Garth is an expert R user and has created several R packages, including the mplot package. He is an Associate Editor for the Australian and New Zealand Journal of Statistics and the Biometric Bulletin’s Software Corner.
This short course focuses on model selection techniques regression models in two scenarios: when an extensive search of the model space is possible as well as when the dimension is large and either stepwise algorithms or regularisation techniques have to be employed to identify good models. We incorporate recent research on graphical tools for model choice and touch on how to tune regularisation procedures, such as the Lasso through resampling or model selection criteria. Importantly, the limitations of the various model selection procedures will be discussed. A key component of the course is assessing the stability of selected components which is paramount for reliable predictive final models. We show how this can be achieved through visualizing measures of stability.
The practical implementation of the discussed methods is an essential component of this course. Interactive labs will give participants the opportunity to apply what they have learnt with some material that can be done after the course, to further digest the material. We will use the cross-platform, open-source software R, in particular we will make use of the lmSubsets
, bestglm
, glmnet
and mplot
packages.
It will be assumed that participants are familiar with R and standard regression modelling techniques.
Links to the resources will be made available on Monday 20 July. Participants are expected to work through the material, watching the recordings and attempting the lab questions, at a time that suits them over a three day period. We will run two Zoom drop in sessions on Thursday 23 July 10-11 AM and 4-5 PM (Sydney AEST). This corresponds to:
During these Zoom sessions we will answer any questions participants may have. The Zoom links will be emailed directly to registered participants.
All links will be posted by Monday 20 July.
Component | Resources |
---|---|
Lecture A: Exhaustive and non-exhaustive algorithms without resampling | Slides | PDF | R code |
Part A1: Selecting models | Video | Audio |
Part A2: Regularisation methods | Video | Audio |
Part A3: Marginality principle | Video | Audio |
Lab A | Questions | Solutions |
Lecture B: Exhaustive and non-exhaustive algorithms with resampling | Slides | PDF | R code |
Part B1: Cross-validation for model selection | Video | Audio |
Part B2: The mplot package | Video | Audio |
Part B3: Subtractive stability measures | Video | Audio |
Lab B | Questions | Solutions |
install.packages("lmSubsets")
install.packages("bestglm")
install.packages("lars")
install.packages("mplot")
install.packages("MASS")
install.packages("Hmisc")
install.packages("car")
install.packages("mfp")