Tag: intro
2 topic(s)
- Hierarchical Clustering & LinkageBuilds a tree (dendrogram) of clusters either bottom-up (agglomerative: start with singletons, merge closest pairs) or top-down (divisive). The linkage criterion — single, complete, average, Ward — defines distance between clusters and dictates the cluster shapes the algorithm prefers.
- Gradient Accumulation & Micro-BatchingSplit a large effective batch into \( k \) micro-batches; accumulate their gradients in a buffer, then step once. Decouples statistical batch size from hardware batch size, enabling \( N\cdot k \) effective batch without a \( k\times \) memory blow-up. Essential for training large models on any GPU.