Key concepts

Stat 154/254: Statistical Machine Learning

For study purposes, this is a list of keywords that I consider central to the course. By the end of the course, you should have a good sense of what each word means and how it relates to other key concepts.

High level concepts

Everything we do in the course is related to at least one of these connected topics.

“Statistics”
“Machine learning”
“Correct specification”
“Prediction”
“Inference”
“Loss”
“Risk”
“Generalization”
“Overfitting”
“Regularization”
“Complexity”
“Probabilistic models”
“Computation”

Intermediate level concepts

These topics provide detail and specificity for the high level concepts in the context of the present course. They provide a conceptual bridge between the high level topics and the actual theoretical and practical tools of the lower level topics.

“Empiricial distribution”
“Population distribution”
“Empiricial risk”
“Population risk”
“Expected loss”
“IID data”
“Regression model”
“Probability model”
“Classifier model”
“Overfitting”
“Bias / variance tradeoff”
“Conditional expectation”
“Complexity”
“Smoothness”
“Basis functions”

Lower level concepts

These are instances of specific key theoretical or practical results from the course. If you actually solve a problem or justify a method, you’ll end up using something at this level of specificity.

“Linear models”
“Linear regression”
“Squared error”
“Mean absolute error”
“Zero-one loss”
“Cross entropy loss”
“Logistic regression”
“Logistic regression loss”
“Residual sum of squares”
“Variable selection”
“Law of large numbers”
“Central limit theorem”
“Uniform law of large numbers”
“Indicator functions”
“Splines”
“Nonlinear feature maps”
“L2 (ridge) regularization”
“L1 (lasso) regularization”
“Regressor colinearity”
“Existence of unique minimizers”
“Series expansion”
“Taylor series”
“Fourier series”
“Cross validation”
“Test set”
“Leave–one–out cross validation”
“K–fold cross validation”
“True positives and false positives”
“True negatives and false negatives”
“Receiver operating characteristic curves”
“Combining estimators”
“Separating hyperplanes”
“Sparse regression vectors”
“Optimization duality”
“Generative modeling for classifiers”
“Discriminative classifiers”
“Shattering”
“VC dimension”
“Integrable Lipschitz condition”
“Maximum likelihood estimation”
“The CART algorithm”
“Decision trees”
“Partitions”
“Cost-complexity pruning”
“Variable importance”
“Bootstrap”
“Bagging”
“Generalized additive models”
“Weak learners”
“Boosting”
“Gradient boosting”
“Greedy optimization”
“Gradient descent”
“Stochastic gradient descent”
“Forward stagewise optimization”

Kernel methods (not covered yet)

“Inner products”
“Dot products”
“The kernel trick”
“The push-through identity”
“Support vector machines”
“Positive definite kernel”
“Gram matrix”
“Eigenvalues”
“Eigenvectors”
“Eigenfunctions”
“Feature maps”
“Monomial features”
“Polynomial kernel”
“RBF kernel”
“Dot product kernel”
“Reproducing kernel Hilbert space”
“RKHS norm”
“RKHS inner product”
“The reproducing property”
“Mercer’s theorem”
“The representer theorem”
“Inner products of functions”
“Infinite-dimensional vector space”

Study ideas

Here are some ways you can use this keyword list as a study aid.

Put each topic in one or several units of the course
Look at a given homework problem.
- Which topics does it touch on at each level?
- How does the homework problem connect different topics?
Pick a low level topic and trace its “conceptual geneology” up through the intermediate and higher level concepts
Make a “concept map” for some set of topics, drawing lines between connected ideas. Explain your concept map to someone else