TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Ghosal, Tirthankar; Salam, Amitra; Tiwari, Swati; Ekbal, Asif; Bhattacharyya, Pushpak

Computer Science > Computation and Language

arXiv:1802.06950 (cs)

[Submitted on 20 Feb 2018]

Title:TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Authors:Tirthankar Ghosal, Amitra Salam, Swati Tiwari, Asif Ekbal, Pushpak Bhattacharyya

View PDF

Abstract:Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic novelty detection techniques in a classification framework. To bridge this gap, we present here a resource for benchmarking the techniques for document level novelty detection. We create the resource via event-specific crawling of news documents across several domains in a periodic manner. We release the annotated corpus with necessary statistics and show its use with a developed system for the problem in concern.

Comments:	Accepted for publication in Language Resources and Evaluation Conference (LREC) 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1802.06950 [cs.CL]
	(or arXiv:1802.06950v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1802.06950

Submission history

From: Tirthankar Ghosal [view email]
[v1] Tue, 20 Feb 2018 03:42:11 UTC (192 KB)

Computer Science > Computation and Language

Title:TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators