Fundamentals
Safety & Alignment
RLHF, DPO, and constitutional methods.
10 min read
Pretraining produces capable but unsteered models.
RLHF, DPO, and constitutional AI shape behavior using human or AI preferences.
Red-teaming and jailbreak testing remain essential before deployment.