Machine LearningWikiPaths

Classical & Probabilistic Models

Study classical unsupervised, latent-variable, and sequential-state models that still shape modern machine learning.

Estimated time: ~120 min

Study this path with flashcards
7 cards
Study →
  1. Step 1
    A one-class SVM learns a boundary around mostly normal data by separating mapped training points from the origin with maximum margin in feature space. It is a classic novelty-detection method because it needs examples of the inlier class but not labeled anomalies.
  2. Step 2
    Anomaly detection identifies observations that look unlikely under the pattern of normal data. The main families are density-based methods, reconstruction-based methods, and one-class classification methods, and the right choice depends on whether you have labels, strong feature engineering, or only normal examples.
  3. Step 3
    Latent Dirichlet Allocation models each document as a mixture of latent topics and each topic as a distribution over words. It is a generative model for uncovering coarse semantic structure in bag-of-words corpora, not a modern contextual language model.
  4. Step 4
    Factor analysis models observed variables as linear combinations of a small number of latent factors plus variable-specific noise. It is useful when the goal is to explain covariance structure rather than merely reduce dimension, which is the key difference from PCA.
  5. Step 5
    The Kalman filter recursively estimates the hidden state of a linear Gaussian dynamical system by alternating a prediction step with a measurement update. It is optimal for that model class because the posterior remains Gaussian and is fully described by a mean and covariance.
  6. Step 6
    A particle filter approximates the posterior over a hidden state with weighted samples, or particles, instead of a single Gaussian. It is useful for nonlinear or non-Gaussian state-space models, but resampling and weight degeneracy are central practical issues.
  7. Step 7
    Canonical correlation analysis finds linear combinations of two random vectors that are maximally correlated with each other. It is the right tool when the question is about shared structure between two views of the same examples rather than variance within a single view.