Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs?

Ding, Hanxing; Tao, Shuchang; Pang, Liang; Wei, Zihao; Chen, Liwei; Xu, Kun; Shen, Huawei; Cheng, Xueqi

Abstract:Retrieval-augmented generation (RAG) systems often suffer from performance degradation when encountering noisy or irrelevant documents, driving researchers to develop sophisticated training strategies to enhance their robustness against such retrieval noise. However, as large language models (LLMs) continue to advance, the necessity of these complex training methods is increasingly questioned. In this paper, we systematically investigate whether complex robust training strategies remain necessary as model capacity grows. Through comprehensive experiments spanning multiple model architectures and parameter scales, we evaluate various document selection methods and adversarial training techniques across diverse datasets. Our extensive experiments consistently demonstrate that as models become more powerful, the performance gains brought by complex robust training methods drop off dramatically. We delve into the rationale and find that more powerful models inherently exhibit superior confidence calibration, better generalization across datasets (even when trained with randomly selected documents), and optimal attention mechanisms learned with simpler strategies. Our findings suggest that RAG systems can benefit from simpler architectures and training strategies as models become more powerful, enabling more scalable applications with minimal complexity.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.11400 [cs.CL]
	(or arXiv:2502.11400v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.11400

Computer Science > Computation and Language

Title:Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators