Causal Inference & Historical Bridges

Bridge causal estimands, interventional reasoning, and the landmark sequence-model ideas that led into modern attention and deep learning.

Estimated time: ~120 min

Study this path with flashcards

6 cards

Study →

Step 1
Instrumental Variables
Instrumental variables identify causal effects when treatment is confounded, provided an instrument affects treatment, is as-good-as random with respect to unobserved confounders, and influences the outcome only through treatment. In simple linear settings, the IV estimand is the ratio of the instrument-outcome covariance to the instrument-treatment covariance.
Step 2
Counterfactuals
A counterfactual asks what would have happened under a different action or treatment than the one that actually occurred. The central difficulty is that for any individual unit, only one potential outcome is observed, so causal inference always requires assumptions to recover the missing alternative.
Step 3
Do-Calculus
Do-calculus is Pearl's set of graphical transformation rules for turning interventional quantities into observational quantities when the causal graph permits it. It matters because it separates what can be identified from data plus structure from what remains fundamentally unidentifiable.
Step 4
Bahdanau Attention
Bahdanau attention is the original additive attention mechanism for sequence-to-sequence models, where the decoder scores each encoder state before producing the next token. It solved the fixed-context bottleneck of early seq2seq RNNs by letting the decoder look back over the whole source sequence at every step.
Step 5
Seq2Seq with Attention
Seq2seq with attention augments the encoder-decoder architecture so the decoder conditions on a context vector built from all encoder states at each output step. That change made neural machine translation far more effective than fixed-context seq2seq and directly paved the way to modern cross-attention and Transformer models.
Step 6
Backpropagation — History (Werbos → Rumelhart/Hinton/Williams)
The history of backpropagation is the story of an idea known in pieces before it became a practical neural-network training method. Werbos articulated reverse-mode differentiation for network training in the 1970s, and Rumelhart, Hinton, and Williams turned it into the landmark 1986 demonstration that made multilayer neural networks trainable in practice.