Tag: historical

7 topic(s)

Restricted Boltzmann Machines (RBM)A bipartite EBM over visible and hidden binary units with energy \( E(v, h) = -v^\top W h - b^\top v - c^\top h \). Conditional independence within each layer gives closed-form conditionals \( p(h\mid v), p(v\mid h) \); Hinton's Contrastive Divergence trains them and the RBM stack forms a deep belief net.
Highway NetworksPredecessor to ResNet: a gated skip connection \( y = H(x) \cdot T(x) + x \cdot (1 - T(x)) \), where \( T(x) \in [0,1] \) is a learned transform gate. Enabled training of 100+ layer networks before residual connections simplified the construction in ResNet.
Capsule NetworksHinton's alternative to CNN pooling: neurons are grouped into 'capsules' whose vector output encodes both existence and pose of an entity. Dynamic routing by agreement replaces max-pool, so each capsule decides which higher-level capsule to vote for based on agreement. Historically significant; practically superseded by transformers.
Neural Program InterpretersA neural program interpreter (NPI) is a network that executes program-like computations: looking up arguments, calling sub-routines, manipulating an external memory or stack, and conditioning on intermediate state. Early work (NPI, NTM, DNC, NeuralGPU) targeted symbolic algorithms; modern descendants are tool-using LLMs and chain-of-thought executors that lean on external interpreters and structured memory.
Bahdanau AttentionBahdanau attention is the original additive attention mechanism for sequence-to-sequence models, where the decoder scores each encoder state before producing the next token. It solved the fixed-context bottleneck of early seq2seq RNNs by letting the decoder look back over the whole source sequence at every step.
Seq2Seq with AttentionSeq2seq with attention augments the encoder-decoder architecture so the decoder conditions on a context vector built from all encoder states at each output step. That change made neural machine translation far more effective than fixed-context seq2seq and directly paved the way to modern cross-attention and Transformer models.
Backpropagation — History (Werbos → Rumelhart/Hinton/Williams)The history of backpropagation is the story of an idea known in pieces before it became a practical neural-network training method. Werbos articulated reverse-mode differentiation for network training in the 1970s, and Rumelhart, Hinton, and Williams turned it into the landmark 1986 demonstration that made multilayer neural networks trainable in practice.