Causal Inference & Historical Bridges
Bridge causal estimands, interventional reasoning, and the landmark sequence-model ideas that led into modern attention and deep learning.
Study this path with flashcards
6 cards
- Step 1Instrumental variables identify causal effects when treatment is confounded, provided an instrument affects treatment, is as-good-as random with respect to unobserved confounders, and influences the outcome only through treatment. In simple linear settings, the IV estimand is the ratio of the instrument-outcome covariance to the instrument-treatment covariance.
- Step 2A counterfactual asks what would have happened under a different action or treatment than the one that actually occurred. The central difficulty is that for any individual unit, only one potential outcome is observed, so causal inference always requires assumptions to recover the missing alternative.
- Step 3Do-calculus is Pearl's set of graphical transformation rules for turning interventional quantities into observational quantities when the causal graph permits it. It matters because it separates what can be identified from data plus structure from what remains fundamentally unidentifiable.
- Step 4Bahdanau attention is the original additive attention mechanism for sequence-to-sequence models, where the decoder scores each encoder state before producing the next token. It solved the fixed-context bottleneck of early seq2seq RNNs by letting the decoder look back over the whole source sequence at every step.
- Step 5Seq2seq with attention augments the encoder-decoder architecture so the decoder conditions on a context vector built from all encoder states at each output step. That change made neural machine translation far more effective than fixed-context seq2seq and directly paved the way to modern cross-attention and Transformer models.
- Step 6The history of backpropagation is the story of an idea known in pieces before it became a practical neural-network training method. Werbos articulated reverse-mode differentiation for network training in the 1970s, and Rumelhart, Hinton, and Williams turned it into the landmark 1986 demonstration that made multilayer neural networks trainable in practice.