Better Prompt Compression Without Multi-Layer Perceptrons

Honig, Edouardo; Lizarraga, Andrew; Zhang, Zijun Frank; Wu, Ying Nian

Computer Science > Computation and Language

arXiv:2501.06730 (cs)

[Submitted on 12 Jan 2025]

Title:Better Prompt Compression Without Multi-Layer Perceptrons

Authors:Edouardo Honig, Andrew Lizarraga, Zijun Frank Zhang, Ying Nian Wu

View PDF HTML (experimental)

Abstract:Prompt compression is a promising approach to speeding up language model inference without altering the generative model. Prior works compress prompts into smaller sequences of learned tokens using an encoder that is trained as a LowRank Adaptation (LoRA) of the inference language model. However, we show that the encoder does not need to keep the original language model's architecture to achieve useful compression. We introduce the Attention-Only Compressor (AOC), which learns a prompt compression encoder after removing the multilayer perceptron (MLP) layers in the Transformer blocks of a language model, resulting in an encoder with roughly 67% less parameters compared to the original model. Intriguingly we find that, across a range of compression ratios up to 480x, AOC can better regenerate prompts and outperform a baseline compression encoder that is a LoRA of the inference language model without removing MLP layers. These results demonstrate that the architecture of prompt compression encoders does not need to be identical to that of the original decoder language model, paving the way for further research into architectures and approaches for prompt compression.

Comments:	7 pages, 0 figures
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2501.06730 [cs.CL]
	(or arXiv:2501.06730v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.06730

Submission history

From: Edouardo Honig [view email]
[v1] Sun, 12 Jan 2025 06:57:06 UTC (13 KB)

Computer Science > Computation and Language

Title:Better Prompt Compression Without Multi-Layer Perceptrons

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Better Prompt Compression Without Multi-Layer Perceptrons

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators