Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems

Roy, Soham; Goswami, Mitul; Nargund, Nisharg; Mohanty, Suneeta; Pattnaik, Prasant Kumar

Computer Science > Information Retrieval

arXiv:2501.09801 (cs)

[Submitted on 16 Jan 2025]

Title:Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems

Authors:Soham Roy, Mitul Goswami, Nisharg Nargund, Suneeta Mohanty, Prasant Kumar Pattnaik

View PDF

Abstract:This study introduces a system leveraging Large Language Models (LLMs) to extract text and enhance user interaction with PDF documents via a conversational interface. Utilizing Retrieval-Augmented Generation (RAG), the system provides informative responses to user inquiries while highlighting relevant passages within the PDF. Upon user upload, the system processes the PDF, employing sentence embeddings to create a document-specific vector store. This vector store enables efficient retrieval of pertinent sections in response to user queries. The LLM then engages in a conversational exchange, using the retrieved information to extract text and generate comprehensive, contextually aware answers. While our approach demonstrates competitive ROUGE values compared to existing state-of-the-art techniques for text extraction and summarization, we acknowledge that further qualitative evaluation is necessary to fully assess its effectiveness in real-world applications. The proposed system gives competitive ROUGE values as compared to existing state-of-the-art techniques for text extraction and summarization, thus offering a valuable tool for researchers, students, and anyone seeking to efficiently extract knowledge and gain insights from documents through an intuitive question-answering interface.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2501.09801 [cs.IR]
	(or arXiv:2501.09801v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2501.09801

Submission history

From: Mitul Goswami [view email]
[v1] Thu, 16 Jan 2025 19:12:25 UTC (448 KB)

Computer Science > Information Retrieval

Title:Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators