RL, Causality & History Anchors
Anchor reinforcement learning, causal inference, and two landmark papers that shaped modern deep learning and transformers.
Study this path with flashcards
6 cards
- Step 1A Markov decision process formalizes sequential decision-making with states, actions, transitions, rewards, and a discount factor. Its key assumption is that the next-state and reward distribution depends only on the current state and action, which makes Bellman-style planning possible.
- Step 2Dynamic programming solves an MDP with a known model by repeatedly applying Bellman updates until values or policies become self-consistent. Policy evaluation, policy improvement, policy iteration, and value iteration are the core algorithms in that family.
- Step 3The potential outcomes framework defines causal effects by comparing the outcomes a unit would have under different treatments. Because only one of those potential outcomes is observed for any given unit, causal inference is fundamentally about identifying missing counterfactuals under defensible assumptions.
- Step 4Confounders create misleading associations because they affect both treatment and outcome, while colliders create bias when you condition on them. Simpson’s paradox is the visible symptom that aggregate and stratified associations can reverse direction when the underlying causal structure is ignored.
- Step 5“Attention Is All You Need” introduced the Transformer: a sequence model built around self-attention instead of recurrence or convolution. The paper mattered because it showed that attention-based, highly parallel sequence modeling could outperform recurrent seq2seq systems and set the template for modern LLMs.
- Step 6AlexNet was the deep convolutional network that won ILSVRC 2012 by a huge margin and triggered the modern deep-learning wave in vision. Its impact came from the full recipe—ImageNet-scale data, GPU training, ReLU, dropout, and augmentation—not from a single isolated trick.