CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

Li, Yu; Pei, Qizhi; Sun, Mengyuan; Lin, Honglin; Ming, Chenlin; Gao, Xin; Wu, Jiang; He, Conghui; Wu, Lijun

Abstract:Large language models (LLMs) have demonstrated remarkable capabilities, especially the recent advancements in reasoning, such as o1 and o3, pushing the boundaries of AI. Despite these impressive achievements in mathematics and coding, the reasoning abilities of LLMs in domains requiring cryptographic expertise remain underexplored. In this paper, we introduce CipherBank, a comprehensive benchmark designed to evaluate the reasoning capabilities of LLMs in cryptographic decryption tasks. CipherBank comprises 2,358 meticulously crafted problems, covering 262 unique plaintexts across 5 domains and 14 subdomains, with a focus on privacy-sensitive and real-world scenarios that necessitate encryption. From a cryptographic perspective, CipherBank incorporates 3 major categories of encryption methods, spanning 9 distinct algorithms, ranging from classical ciphers to custom cryptographic techniques. We evaluate state-of-the-art LLMs on CipherBank, e.g., GPT-4o, DeepSeek-V3, and cutting-edge reasoning-focused models such as o1 and DeepSeek-R1. Our results reveal significant gaps in reasoning abilities not only between general-purpose chat LLMs and reasoning-focused LLMs but also in the performance of current reasoning-focused models when applied to classical cryptographic decryption tasks, highlighting the challenges these models face in understanding and manipulating encrypted data. Through detailed analysis and error investigations, we provide several key observations that shed light on the limitations and potential improvement areas for LLMs in cryptographic reasoning. These findings underscore the need for continuous advancements in LLM reasoning capabilities.

Comments:	Work in progress
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Performance (cs.PF)
Cite as:	arXiv:2504.19093 [cs.CR]
	(or arXiv:2504.19093v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2504.19093

Computer Science > Cryptography and Security

Title:CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators