Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Zhang, Xue

Computer Science > Artificial Intelligence

arXiv:2504.04918 (cs)

[Submitted on 7 Apr 2025]

Title:Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Authors:Xue Zhang

View PDF HTML (experimental)

Abstract:As language models continue to grow larger, the cost of acquiring high-quality training data has increased significantly. Collecting human feedback is both expensive and time-consuming, and manual labels can be noisy, leading to an imbalance between helpfulness and harmfulness. Constitutional AI, introduced by Anthropic in December 2022, uses AI to provide feedback to another AI, greatly reducing the need for human labeling. However, the original implementation was designed for a model with around 52 billion parameters, and there is limited information on how well Constitutional AI performs with smaller models, such as LLaMA 3-8B. In this paper, we replicated the Constitutional AI workflow using the smaller LLaMA 3-8B model. Our results show that Constitutional AI can effectively increase the harmlessness of the model, reducing the Attack Success Rate in MT-Bench by 40.8%. However, similar to the original study, increasing harmlessness comes at the cost of helpfulness. The helpfulness metrics, which are an average of the Turn 1 and Turn 2 scores, dropped by 9.8% compared to the baseline. Additionally, we observed clear signs of model collapse in the final DPO-CAI model, indicating that smaller models may struggle with self-improvement due to insufficient output quality, making effective fine-tuning more challenging. Our study suggests that, like reasoning and math ability, self-improvement is an emergent property.

Comments:	6 pages, 2 figures. Conducted as part of research on alignment techniques for language models
Subjects:	Artificial Intelligence (cs.AI)
MSC classes:	68T05, 68T50
ACM classes:	I.2.6; I.2.7; I.2.1
Cite as:	arXiv:2504.04918 [cs.AI]
	(or arXiv:2504.04918v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2504.04918

Submission history

From: Xue Zhang [view email]
[v1] Mon, 7 Apr 2025 11:01:25 UTC (524 KB)

Computer Science > Artificial Intelligence

Title:Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators