Statistical Inference & Calibration

Move from hypothesis testing and likelihood methods to uncertainty quantification and probabilistic scoring.

Estimated time: ~90 min

Study this path with flashcards

5 cards

Study →

Step 1
Multiple Hypothesis Testing and False Discovery Rate
Multiple hypothesis testing asks how to control false positives when many tests are run at once. False discovery rate control, especially the Benjamini–Hochberg procedure, limits the expected fraction of rejected hypotheses that are actually null and is usually less conservative than family-wise error control.
Step 2
Likelihood Ratio Tests
A likelihood ratio test compares how well two nested statistical models explain the same data by taking the ratio of their maximized likelihoods. Large likelihood-ratio statistics indicate that the larger model fits substantially better than the restricted one, and under regularity conditions the test statistic is asymptotically chi-squared.
Step 3
Importance Sampling
Importance sampling estimates an expectation under a target distribution by drawing samples from a different proposal distribution and reweighting them. It is powerful when the proposal places more mass in the important regions of the integrand, but unstable weights can make the variance explode.
Step 4
Bootstrap Confidence Intervals
Bootstrap confidence intervals estimate uncertainty by resampling the observed dataset with replacement and recomputing the statistic many times. They are useful when analytic standard errors are awkward, but they inherit the sample's biases and can fail when the original sample is too small or unrepresentative.
Step 5
Brier Score
The Brier score measures the mean squared error of probabilistic predictions, so it rewards both correctness and calibration. Lower is better, and unlike accuracy it penalizes a confidently wrong 0.99 prediction much more than a cautious 0.6 prediction.