Bottleneck-Minimal Indexing for Generative Document Retrieval

Du, Xin; Xiu, Lixin; Tanaka-Ishii, Kumiko

Computer Science > Information Retrieval

arXiv:2405.10974 (cs)

[Submitted on 12 May 2024 (v1), last revised 21 May 2024 (this version, v2)]

Title:Bottleneck-Minimal Indexing for Generative Document Retrieval

Authors:Xin Du, Lixin Xiu, Kumiko Tanaka-Ishii

View PDF HTML (experimental)

Abstract:We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document $x \in X$ is indexed by $t \in T$, and a neural autoregressive model is trained to map queries $Q$ to $T$. GDR can be considered to involve information transmission from documents $X$ to queries $Q$, with the requirement to transmit more bits via the indexes $T$. By applying Shannon's rate-distortion theory, the optimality of indexing can be analyzed in terms of the mutual information, and the design of the indexes $T$ can then be regarded as a {\em bottleneck} in GDR. After reformulating GDR from this perspective, we empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneck-minimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods.

Comments:	Accepted for ICML 2024
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2405.10974 [cs.IR]
	(or arXiv:2405.10974v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2405.10974

Submission history

From: Xin Du PhD [view email]
[v1] Sun, 12 May 2024 11:41:26 UTC (1,378 KB)
[v2] Tue, 21 May 2024 01:29:13 UTC (1,378 KB)

Computer Science > Information Retrieval

Title:Bottleneck-Minimal Indexing for Generative Document Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Bottleneck-Minimal Indexing for Generative Document Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators