Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Wei, Lai; Tan, Zhiquan; Li, Chenghai; Wang, Jindong; Huang, Weiran

Computer Science > Machine Learning

arXiv:2401.17139 (cs)

[Submitted on 30 Jan 2024 (v1), last revised 14 Oct 2024 (this version, v2)]

Title:Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Authors:Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, Weiran Huang

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their evaluation. In this paper, we introduce a novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles. Diff-eRank assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training. We demonstrate the applicability of Diff-eRank in both single-modal (e.g., language) and multi-modal settings. For language models, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method. Our code is publicly available at this https URL.

Comments:	Accepted by NeurIPS 2024
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Theory (cs.IT)
Cite as:	arXiv:2401.17139 [cs.LG]
	(or arXiv:2401.17139v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.17139

Submission history

From: Lai Wei [view email]
[v1] Tue, 30 Jan 2024 16:19:55 UTC (197 KB)
[v2] Mon, 14 Oct 2024 04:36:09 UTC (166 KB)

Computer Science > Machine Learning

Title:Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators