EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

Rathinasamy, Kamalkumar; Nettar, Jayarama; Kumar, Amit; Manchanda, Vishal; Vijayakumar, Arun; Kataria, Ayush; Manjunath, Venkateshprasanna; GS, Chidambaram; Sodhi, Jaskirat Singh; Shaikh, Shoeb; Khan, Wasim Akhtar; Singh, Prashant; Ige, Tanishq Dattatray; Tiwari, Vipin; Mondal, Rajab Ali; K, Harshini; Reka, S; Amancharla, Chetana; Rahman, Faiz ur; A, Harikrishnan P; Saha, Indraneel; Tiwary, Bhavya; Patel, Navin Shankar; S, Pradeep T; J, Balaji A; Priyapravas; Tarafdar, Mohammed Rafee

Abstract:Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components. While pre-trained embeddings may exhibit proximity or disparity based on their original training objectives, they might not fully align with the unique characteristics of enterprise-specific data, leading to suboptimal alignment with the retrieval goals of enterprise environments. In this paper, we propose a comprehensive methodology for contextualizing pre-trained embedding models to enterprise environments, covering the entire process from data preparation to model fine-tuning and evaluation. By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of information retrieval solutions. We discuss the process of fine-tuning, its effect on retrieval accuracy, and the potential benefits for enterprise information management. Our findings demonstrate the efficacy of fine-tuned embedding models in improving the precision and relevance of search results in enterprise settings.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
ACM classes:	I.2.7
Cite as:	arXiv:2406.00010 [cs.IR]
	(or arXiv:2406.00010v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2406.00010

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Information Retrieval

Title:EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators