Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

Sun, Renliang; Liu, Mengyuan; Yang, Shiping; Wang, Rui; He, Junqing; Zhang, Jiaxing

Computer Science > Computation and Language

arXiv:2408.09330 (cs)

[Submitted on 18 Aug 2024 (v1), last revised 15 Oct 2024 (this version, v2)]

Title:Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

Authors:Renliang Sun, Mengyuan Liu, Shiping Yang, Rui Wang, Junqing He, Jiaxing Zhang

View PDF HTML (experimental)

Abstract:Benefiting from diverse instruction datasets, contemporary Large Language Models (LLMs) perform effectively as AI assistants in collaborating with humans. However, LLMs still struggle to generate natural and colloquial responses in real-world applications such as chatbots and psychological counseling that require more human-like interactions. To address these limitations, we introduce NICO, a Natural Interactive COnversation dataset in Chinese. We first use GPT-4-turbo to generate dialogue drafts and make them cover 20 daily-life topics and 5 types of social interactions. Then, we hire workers to revise these dialogues to ensure that they are free of grammatical errors and unnatural utterances. We define two dialogue-level natural conversation tasks and two sentence-level tasks for identifying and rewriting unnatural sentences. Multiple open-source and closed-source LLMs are tested and analyzed in detail. The experimental results highlight the challenge of the tasks and demonstrate how NICO can help foster the natural dialogue capabilities of LLMs. The dataset will be released.

Comments:	16 pages, 3 figures, 10 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2408.09330 [cs.CL]
	(or arXiv:2408.09330v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.09330

Submission history

From: Renliang Sun [view email]
[v1] Sun, 18 Aug 2024 02:06:25 UTC (643 KB)
[v2] Tue, 15 Oct 2024 05:55:30 UTC (625 KB)

Computer Science > Computation and Language

Title:Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators