Learning Theory & Evaluation
Connect risk minimization, validation protocol, evaluation metrics, and deployment-shift thinking so model selection and measurement line up with generalization.
Study this path with flashcards
8 cards
- Step 1Empirical risk minimization chooses the model with the smallest average training loss. It is the default principle behind most supervised learning, but it must be paired with capacity control or held-out evaluation because low training loss alone does not guarantee generalization.
- Step 2Structural risk minimization extends empirical risk minimization by balancing training fit against model complexity. It is the learning-theoretic principle behind regularization, margin control, and choosing among hypothesis classes of different capacity.
- Step 3A train/validation/test split separates fitting, model selection, and final evaluation into different datasets. The test set is kept untouched until the end so it remains a credible estimate of out-of-sample performance.
- Step 4k-fold cross-validation rotates a held-out fold through the dataset so every example is used for validation once and training the other times. It uses limited data efficiently for model selection, but it costs multiple training runs and must keep preprocessing inside each fold.
- Step 5A confusion matrix counts predicted labels against true labels. In binary classification it yields the four basic counts—true positives, false positives, true negatives, and false negatives—from which most common thresholded metrics are derived.
- Step 6A precision-recall curve shows how precision and recall trade off as the decision threshold moves through a ranked list of predictions. Average precision summarizes that curve and is especially informative when the positive class is rare.
- Step 7Feature scaling rescales input dimensions to comparable magnitudes, while standardization specifically subtracts the training mean and divides by the training standard deviation. It matters because optimization, distances, and margins can otherwise be dominated by whichever feature uses the largest units.
- Step 8Distribution shift occurs when the joint distribution seen at deployment differs from the one used for training or validation. The main cases are covariate shift, label shift, and concept shift; each breaks generalization in a different way and therefore requires different detection and mitigation strategies.