Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs

Yang, Zhe; Zhang, Yichang; Wang, Yudong; Xu, Ziyao; Lin, Junyang; Sui, Zhifang

Abstract:Large Language Models (LLMs) can correct their self-generated responses, but a decline in accuracy after self-correction is also witnessed. To have a deeper understanding of self-correction, we endeavor to decompose, evaluate, and analyze the self-correction behaviors of LLMs. By enumerating and analyzing answer correctness before and after self-correction, we decompose the self-correction capability into confidence (being confident to correct answers) and critique (turning wrong answers to correct) capabilities, and propose two metrics from a probabilistic perspective to measure these 2 capabilities, along with another metric for overall self-correction capability evaluation. Based on our decomposition and evaluation metrics, we conduct extensive experiments and draw some empirical conclusions. For example, we find different models can exhibit distinct behaviors: some models are confident while others are more critical. We also find the trade-off between the two capabilities (i.e. improving one can lead to a decline in the other) when manipulating model self-correction behavior by prompts or in-context learning. Further, we find a simple yet efficient strategy to improve self-correction capability by transforming Supervision Fine-Tuning (SFT) data format, and our strategy outperforms vanilla SFT in both capabilities and achieves much higher accuracy after self-correction. Our code will be publicly available on GitHub.

Comments:	16 pages, 10 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.19513 [cs.CL]
	(or arXiv:2412.19513v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.19513

Computer Science > Computation and Language

Title:Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators