Tag: path-alignment-rlhf

1 topic(s)

Alignment Techniques (RLHF, DPO, RLAIF, comparison)Modern LLM alignment uses preference data to adjust a pretrained model so it follows instructions, refuses unsafe content, and ranks desired behaviours above undesired ones. The dominant recipes — RLHF with PPO, DPO and its variants, and RLAIF with AI-generated preferences — share the same Bradley–Terry preference model but differ in optimiser, reward-model dependence, and stability.