Optimizing Multi-Stage Language Models for Effective Text Retrieval

Trung, Quang Hoang; Hoang, Le Trung; Phuc, Nguyen Van Hoang

Computer Science > Information Retrieval

arXiv:2412.19265 (cs)

[Submitted on 26 Dec 2024]

Title:Optimizing Multi-Stage Language Models for Effective Text Retrieval

Authors:Quang Hoang Trung, Le Trung Hoang, Nguyen Van Hoang Phuc

View PDF HTML (experimental)

Abstract:Efficient text retrieval is critical for applications such as legal document analysis, particularly in specialized contexts like Japanese legal systems. Existing retrieval methods often underperform in such domain-specific scenarios, necessitating tailored approaches. In this paper, we introduce a novel two-phase text retrieval pipeline optimized for Japanese legal datasets. Our method leverages advanced language models to achieve state-of-the-art performance, significantly improving retrieval efficiency and accuracy. To further enhance robustness and adaptability, we incorporate an ensemble model that integrates multiple retrieval strategies, resulting in superior outcomes across diverse tasks. Extensive experiments validate the effectiveness of our approach, demonstrating strong performance on both Japanese legal datasets and widely recognized benchmarks like MS-MARCO. Our work establishes new standards for text retrieval in domain-specific and general contexts, providing a comprehensive solution for addressing complex queries in legal and multilingual environments.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2412.19265 [cs.IR]
	(or arXiv:2412.19265v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2412.19265

Submission history

From: Quang Hoang Trung [view email]
[v1] Thu, 26 Dec 2024 16:05:19 UTC (1,318 KB)

Computer Science > Information Retrieval

Title:Optimizing Multi-Stage Language Models for Effective Text Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Optimizing Multi-Stage Language Models for Effective Text Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators