Modeling Information Change in Science Communication with Semantically Matched Paraphrases

Wright, Dustin; Pei, Jiaxin; Jurgens, David; Augenstein, Isabelle

Computer Science > Computation and Language

arXiv:2210.13001 (cs)

[Submitted on 24 Oct 2022]

Title:Modeling Information Change in Science Communication with Semantically Matched Paraphrases

Authors:Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein

View PDF

Abstract:Whether the media faithfully communicate scientific information has long been a core issue to the science community. Automatically identifying paraphrased scientific findings could enable large-scale tracking and analysis of information changes in the science communication process, but this requires systems to understand the similarity between scientific information across multiple domains. To this end, we present the SCIENTIFIC PARAPHRASE AND INFORMATION CHANGE DATASET (SPICED), the first paraphrase dataset of scientific findings annotated for degree of information change. SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers. We demonstrate that SPICED poses a challenging task and that models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims. Finally, we show that models trained on SPICED can reveal large-scale trends in the degrees to which people and organizations faithfully communicate new scientific findings. Data, code, and pre-trained models are available at this http URL.

Comments:	In EMNLP 2022; 25 pages; 11 figures; 6 tables
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2210.13001 [cs.CL]
	(or arXiv:2210.13001v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.13001

Submission history

From: Dustin Wright [view email]
[v1] Mon, 24 Oct 2022 07:44:38 UTC (758 KB)

Computer Science > Computation and Language

Title:Modeling Information Change in Science Communication with Semantically Matched Paraphrases

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Modeling Information Change in Science Communication with Semantically Matched Paraphrases

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators