Tag: representation-learning
10 topic(s)
- Contrastive Learning TheoryThe theoretical account of why contrastive self-supervised objectives like InfoNCE produce useful representations. Wang & Isola (2020) show the InfoNCE loss decomposes into two asymptotic terms — \( \mathcal{L}_{\text{align}} \), pulling positive pairs together, and \( \mathcal{L}_{\text{unif}} \), spreading the marginal feature distribution uniformly on the hypersphere. The downstream linear-probe accuracy correlates almost perfectly with this alignment-uniformity trade-off, giving a geometric explanation for why contrastive learning works at all.
- Non-Contrastive SSL (BYOL, SimSiam, Barlow Twins, VICReg)Non-contrastive self-supervised learning learns representations from multiple views of the same example without explicit negative pairs. Methods such as BYOL, SimSiam, Barlow Twins, and VICReg avoid collapse using asymmetry, stop-gradient, redundancy reduction, or variance-preserving terms instead of contrastive negatives.
- Disentangled Representation LearningDisentangled representation learning seeks latent coordinates that each correspond to separate underlying factors of variation in the data. It is attractive for control and interpretability, but in the unsupervised setting true disentanglement is usually not identifiable without extra inductive bias or supervision.
- Sparse Representations in Deep NetsA sparse representation is one where only a small fraction of units are active for any given input. Deep nets often develop sparsity through ReLU-like nonlinearities or explicit penalties, which can improve efficiency, feature selectivity, and sometimes interpretability.
- Representation CollapseRepresentation collapse is the failure mode where many inputs map to nearly the same embedding or hidden state, destroying useful information. It appears in several forms — constant-vector collapse in self-supervision, dimensional collapse where only a few directions survive, and cluster collapse in discrete latents — and each requires a different fix.
- Invariance vs EquivarianceA representation is invariant to a transformation if the output does not change when the input is transformed, and equivariant if the output changes in a predictable transformed way. CNN translation equivariance and classifier translation invariance are the canonical example pair.
- Embedding GeometryEmbedding geometry studies what information is encoded in the distances, angles, and directions of an embedding space. Properties like similarity, analogy structure, anisotropy, and clustering determine how useful an embedding is for retrieval and downstream tasks.
- Metric Learning at ScaleMetric learning at scale trains embeddings so similar items are close and dissimilar items are far apart even when the dataset is too large for naive pairwise or triplet mining. The main challenge is finding informative negatives and keeping computation manageable as the corpus grows.
- Representation Alignment Across ModalitiesRepresentation alignment across modalities trains different encoders so paired inputs, such as an image and its caption, land near each other in a shared embedding space. This makes cross-modal retrieval and transfer possible by giving different modalities a common geometry.
- Tokenization as RepresentationTokenization is not just preprocessing: it decides which units the model can represent directly and therefore shapes the statistics the model learns. The choice of characters, subwords, bytes, or domain-specific tokens changes sequence length, vocabulary size, inductive bias, and how cleanly concepts map into embeddings.