ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Zhao, Gejian; Wu, Hanzhou; Zhang, Xinpeng; Vasilakos, Athanasios V.

Computer Science > Cryptography and Security

arXiv:2504.05605 (cs)

[Submitted on 8 Apr 2025]

Title:ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Authors:Gejian Zhao, Hanzhou Wu, Xinpeng Zhang, Athanasios V. Vasilakos

View PDF HTML (experimental)

Abstract:Chain-of-Thought (CoT) enhances an LLM's ability to perform complex reasoning tasks, but it also introduces new security issues. In this work, we present ShadowCoT, a novel backdoor attack framework that targets the internal reasoning mechanism of LLMs. Unlike prior token-level or prompt-based attacks, ShadowCoT directly manipulates the model's cognitive reasoning path, enabling it to hijack multi-step reasoning chains and produce logically coherent but adversarial outcomes. By conditioning on internal reasoning states, ShadowCoT learns to recognize and selectively disrupt key reasoning steps, effectively mounting a self-reflective cognitive attack within the target model. Our approach introduces a lightweight yet effective multi-stage injection pipeline, which selectively rewires attention pathways and perturbs intermediate representations with minimal parameter overhead (only 0.15% updated). ShadowCoT further leverages reinforcement learning and reasoning chain pollution (RCP) to autonomously synthesize stealthy adversarial CoTs that remain undetectable to advanced defenses. Extensive experiments across diverse reasoning benchmarks and LLMs show that ShadowCoT consistently achieves high Attack Success Rate (94.4%) and Hijacking Success Rate (88.4%) while preserving benign performance. These results reveal an emergent class of cognition-level threats and highlight the urgent need for defenses beyond shallow surface-level consistency.

Comments:	Zhao et al., 16 pages, 2025, uploaded by Hanzhou Wu, Shanghai University
Subjects:	Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Cite as:	arXiv:2504.05605 [cs.CR]
	(or arXiv:2504.05605v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2504.05605

Submission history

From: Hanzhou Wu [view email]
[v1] Tue, 8 Apr 2025 01:36:16 UTC (989 KB)

Computer Science > Cryptography and Security

Title:ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators