Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)

Kenthapadi, Krishnaram; Sameki, Mehrnoosh; Taly, Ankur

doi:10.1145/3637528.3671467

Computer Science > Computation and Language

arXiv:2407.12858 (cs)

[Submitted on 10 Jul 2024]

Title:Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)

Authors:Krishnaram Kenthapadi, Mehrnoosh Sameki, Ankur Taly

View PDF HTML (experimental)

Abstract:With the ongoing rapid adoption of Artificial Intelligence (AI)-based systems in high-stakes domains, ensuring the trustworthiness, safety, and observability of these systems has become crucial. It is essential to evaluate and monitor AI systems not only for accuracy and quality-related metrics but also for robustness, bias, security, interpretability, and other responsible AI dimensions. We focus on large language models (LLMs) and other generative AI models, which present additional challenges such as hallucinations, harmful and manipulative content, and copyright infringement. In this survey article accompanying our KDD 2024 tutorial, we highlight a wide range of harms associated with generative AI systems, and survey state of the art approaches (along with open challenges) to address these harms.

Comments:	Survey Article for the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2024) Tutorial
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2407.12858 [cs.CL]
	(or arXiv:2407.12858v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.12858
Journal reference:	Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)
Related DOI:	https://doi.org/10.1145/3637528.3671467

Submission history

From: Krishnaram Kenthapadi [view email]
[v1] Wed, 10 Jul 2024 01:23:10 UTC (135 KB)

Computer Science > Computation and Language

Title:Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators