Machine LearningWikiPaths

Statistical Decision Theory & Inference

Bridge the probability and decision-theoretic ideas that connect Bayesian inference, calibrated prediction, and classical statistical learning.

Estimated time: ~75 min

Study this path with flashcards
6 cards
Study →
  1. Step 1
    The law of total probability computes an event probability by summing over mutually exclusive, exhaustive cases. In machine learning it is the basic marginalization identity behind latent-variable models, mixture models, and many Bayesian calculations.
  2. Step 2
    Conditional independence means two variables become unrelated once a third variable is known. It is the simplifying assumption that makes graphical models tractable and explains why conditioning can either remove dependence or, in collider structures, create it.
  3. Step 3
    A sufficient statistic is a summary of the sample that retains all information about a parameter relevant for inference. This is why many classical models can replace an entire dataset with counts, sums, or means without changing the likelihood-based conclusions about the parameter.
  4. Step 4
    Bayes risk is the minimum achievable expected loss under the true data distribution, and the Bayes-optimal classifier attains it by minimizing posterior expected loss for each input. Under ordinary 0–1 loss, that rule becomes “predict the class with highest posterior probability.”
  5. Step 5
    A scoring rule is proper if a forecaster minimizes expected score by reporting their true predictive distribution. Proper scoring rules matter because they reward honest, calibrated probabilities rather than merely getting the top-ranked class right.
  6. Step 6
    Surrogate losses replace hard-to-optimize 0–1 classification loss with tractable objectives such as logistic or hinge loss. A surrogate is classification-calibrated if optimizing it still drives the classifier toward the Bayes-optimal decision rule.