Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models

Amiri-Margavi, Alireza; Jebellat, Iman; Jebellat, Ehsan; Davoudi, Seyed Pouyan Mousavi

Computer Science > Computation and Language

arXiv:2411.16797 (cs)

[Submitted on 25 Nov 2024]

Title:Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models

Authors:Alireza Amiri-Margavi, Iman Jebellat, Ehsan Jebellat, Seyed Pouyan Mousavi Davoudi

View PDF HTML (experimental)

Abstract:We explore the collaborative dynamics of an innovative language model interaction system involving advanced models such as GPT-4-0125-preview, Meta-LLaMA-3-70B-Instruct, Claude-3-Opus, and Gemini-1.5-Flash. These models generate and answer complex, PhD-level statistical questions without exact ground-truth answers. Our study investigates how inter-model consensus enhances the reliability and precision of responses. By employing statistical methods such as chi-square tests, Fleiss' Kappa, and confidence interval analysis, we evaluate consensus rates and inter-rater agreement to quantify the reliability of collaborative outputs. Key results reveal that Claude and GPT-4 exhibit the highest reliability and consistency, as evidenced by their narrower confidence intervals and higher alignment with question-generating models. Conversely, Gemini and LLaMA show more significant variability in their consensus rates, as reflected in wider confidence intervals and lower reliability percentages. These findings demonstrate that collaborative interactions among large language models (LLMs) significantly improve response reliability, offering novel insights into autonomous, cooperative reasoning and validation in AI systems.

Comments:	15 pages, 2 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.16797 [cs.CL]
	(or arXiv:2411.16797v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.16797

Submission history

From: Alireza Amiri-Margavi Mr [view email]
[v1] Mon, 25 Nov 2024 10:18:17 UTC (317 KB)

Computer Science > Computation and Language

Title:Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators