Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

Lin, Zicheng; Liang, Tian; Xu, Jiahao; Wang, Xing; Luo, Ruilin; Shi, Chufan; Li, Siheng; Yang, Yujiu; Tu, Zhaopeng

Computer Science > Computation and Language

arXiv:2411.19943 (cs)

[Submitted on 29 Nov 2024 (v1), last revised 2 Dec 2024 (this version, v2)]

Title:Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

Authors:Zicheng Lin, Tian Liang, Jiahao Xu, Xing Wang, Ruilin Luo, Chufan Shi, Siheng Li, Yujiu Yang, Zhaopeng Tu

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have exhibited remarkable performance on reasoning tasks. They utilize autoregressive token generation to construct reasoning trajectories, enabling the development of a coherent chain of thought. In this work, we explore the impact of individual tokens on the final outcomes of reasoning tasks. We identify the existence of ``critical tokens'' that lead to incorrect reasoning trajectories in LLMs. Specifically, we find that LLMs tend to produce positive outcomes when forced to decode other tokens instead of critical tokens. Motivated by this observation, we propose a novel approach - cDPO - designed to automatically recognize and conduct token-level rewards for the critical tokens during the alignment process. Specifically, we develop a contrastive estimation approach to automatically identify critical tokens. It is achieved by comparing the generation likelihood of positive and negative models. To achieve this, we separately fine-tune the positive and negative models on various reasoning trajectories, consequently, they are capable of identifying identify critical tokens within incorrect trajectories that contribute to erroneous outcomes. Moreover, to further align the model with the critical token information during the alignment process, we extend the conventional DPO algorithms to token-level DPO and utilize the differential likelihood from the aforementioned positive and negative model as important weight for token-level DPO this http URL results on GSM8K and MATH500 benchmarks with two-widely used models Llama-3 (8B and 70B) and deepseek-math (7B) demonstrate the effectiveness of the propsoed approach cDPO.

Comments:	Work in progress
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2411.19943 [cs.CL]
	(or arXiv:2411.19943v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.19943

Submission history

From: Zicheng Lin [view email]
[v1] Fri, 29 Nov 2024 18:58:22 UTC (363 KB)
[v2] Mon, 2 Dec 2024 06:26:38 UTC (363 KB)

Computer Science > Computation and Language

Title:Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators