Henrik Nordberg
Home
Projects
Technology
Machine Learning
Leadership
Publications
Contact
ML Wiki
Overview
Learning Paths
Topics
Tags
Flashcards
Machine Learning
/
Wiki
/
Tags
Tag: path-alignment-rlhf
1 topic(s)
Alignment Techniques (RLHF, DPO, RLAIF, comparison)
Modern LLM alignment uses preference data to adjust a pretrained model so it follows instructions, refuses unsafe content, and ranks desired behaviours above undesired ones. The dominant recipes — RLHF with PPO, DPO and its variants, and RLAIF with AI-generated preferences — share the same Bradley–Terry preference model but differ in optimiser, reward-model dependence, and stability.