Domain-specific ChatBots for Science using Embeddings

Yager, Kevin G.

doi:10.1039/D3DD00112A

Computer Science > Computation and Language

arXiv:2306.10067 (cs)

[Submitted on 15 Jun 2023 (v1), last revised 24 Aug 2023 (this version, v2)]

Title:Domain-specific ChatBots for Science using Embeddings

Authors:Kevin G. Yager

View PDF

Abstract:Large language models (LLMs) have emerged as powerful machine-learning systems capable of handling a myriad of tasks. Tuned versions of these systems have been turned into chatbots that can respond to user queries on a vast diversity of topics, providing informative and creative replies. However, their application to physical science research remains limited owing to their incomplete knowledge in these areas, contrasted with the needs of rigor and sourcing in science domains. Here, we demonstrate how existing methods and software tools can be easily combined to yield a domain-specific chatbot. The system ingests scientific documents in existing formats, and uses text embedding lookup to provide the LLM with domain-specific contextual information when composing its reply. We similarly demonstrate that existing image embedding methods can be used for search and retrieval across publication figures. These results confirm that LLMs are already suitable for use by physical scientists in accelerating their research efforts.

Comments:	14 pages, 6 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
ACM classes:	I.2.7
Cite as:	arXiv:2306.10067 [cs.CL]
	(or arXiv:2306.10067v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.10067
Related DOI:	https://doi.org/10.1039/D3DD00112A

Submission history

From: Kevin Yager [view email]
[v1] Thu, 15 Jun 2023 15:26:20 UTC (36,361 KB)
[v2] Thu, 24 Aug 2023 20:24:13 UTC (44,408 KB)

Computer Science > Computation and Language

Title:Domain-specific ChatBots for Science using Embeddings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Domain-specific ChatBots for Science using Embeddings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators