GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

Zhu, Zhiying; Liang, Weixin; Zou, James

Computer Science > Computation and Language

arXiv:2206.15007 (cs)

[Submitted on 30 Jun 2022]

Title:GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

Authors:Zhiying Zhu, Weixin Liang, James Zou

View PDF

Abstract:Helping end users comprehend the abstract distribution shifts can greatly facilitate AI deployment. Motivated by this, we propose a novel task, dataset explanation. Given two image data sets, dataset explanation aims to automatically point out their dataset-level distribution shifts with natural language. Current techniques for monitoring distribution shifts provide inadequate information to understand datasets with the goal of improving data quality. Therefore, we introduce GSCLIP, a training-free framework to solve the dataset explanation task. In GSCLIP, we propose the selector as the first quantitative evaluation method to identify explanations that are proper to summarize dataset shifts. Furthermore, we leverage this selector to demonstrate the superiority of a generator based on language model generation. Systematic evaluation on natural data shift verifies that GSCLIP, a combined system of a hybrid generator group and an efficient selector is not only easy-to-use but also powerful for dataset explanation at scale.

Comments:	Accepted by ICML 2022 DataPerf
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2206.15007 [cs.CL]
	(or arXiv:2206.15007v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2206.15007

Submission history

From: Zhiying Zhu [view email]
[v1] Thu, 30 Jun 2022 04:06:26 UTC (634 KB)

Computer Science > Computation and Language

Title:GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators