STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains

Zhu, Chencheng; Shimada, Kazutaka; Taniguchi, Tomoki; Ohkuma, Tomoko

Computer Science > Computation and Language

arXiv:2412.20043 (cs)

[Submitted on 28 Dec 2024]

Title:STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains

Authors:Chencheng Zhu, Kazutaka Shimada, Tomoki Taniguchi, Tomoko Ohkuma

View PDF HTML (experimental)

Abstract:Large language models (LLMs) demonstrate the ability to learn in-context, offering a potential solution for scientific information extraction, which often contends with challenges such as insufficient training data and the high cost of annotation processes. Given that the selection of in-context examples can significantly impact performance, it is crucial to design a proper method to sample the efficient ones. In this paper, we propose STAYKATE, a static-dynamic hybrid selection method that combines the principles of representativeness sampling from active learning with the prevalent retrieval-based approach. The results across three domain-specific datasets indicate that STAYKATE outperforms both the traditional supervised methods and existing selection methods. The enhancement in performance is particularly pronounced for entity types that other methods pose challenges.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.20043 [cs.CL]
	(or arXiv:2412.20043v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.20043

Submission history

From: Chencheng Zhu [view email]
[v1] Sat, 28 Dec 2024 06:13:50 UTC (2,869 KB)

Computer Science > Computation and Language

Title:STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators