Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Yang, Zhou; Qi, Zhengyu; Ren, Zhaochun; Jia, Zhikai; Sun, Haizhou; Zhu, Xiaofei; Liao, Xiangwen

Computer Science > Computation and Language

arXiv:2501.00999 (cs)

[Submitted on 2 Jan 2025 (v1), last revised 6 Jan 2025 (this version, v2)]

Title:Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Authors:Zhou Yang, Zhengyu Qi, Zhaochun Ren, Zhikai Jia, Haizhou Sun, Xiaofei Zhu, Xiangwen Liao

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances reasoning performance and inference efficiency by compressing retrieved example information into the task space. TS-FT employs a space-guided loss to fine-tune LLMs, encouraging the learning of more effective compression and selection mechanisms. Experiments across multiple datasets validate the effectiveness of task space construction. Additionally, IC-ICL not only improves performance but also accelerates inference speed by over 40\%, while TS-FT achieves superior results with a minimal strategy adjustment.

Comments:	9 pages, 9 figures, 3 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.00999 [cs.CL]
	(or arXiv:2501.00999v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.00999

Submission history

From: Zhou Yang [view email]
[v1] Thu, 2 Jan 2025 01:33:58 UTC (1,456 KB)
[v2] Mon, 6 Jan 2025 01:49:09 UTC (1,456 KB)

Computer Science > Computation and Language

Title:Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators