All topics
Fundamentals

Safety & Alignment

RLHF, DPO, and constitutional methods.

10 min read

Pretraining produces capable but unsteered models.

RLHF, DPO, and constitutional AI shape behavior using human or AI preferences.

Red-teaming and jailbreak testing remain essential before deployment.