Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Glavaš, Goran; Somasundaran, Swapna

Computer Science > Computation and Language

arXiv:2001.00891 (cs)

[Submitted on 3 Jan 2020]

Title:Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Authors:Goran Glavaš, Swapna Somasundaran

View PDF

Abstract:Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and segmentation, we introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model -- a neural architecture consisting of two hierarchically connected Transformer networks -- is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones. The proposed model, dubbed Coherence-Aware Text Segmentation (CATS), yields state-of-the-art segmentation performance on a collection of benchmark datasets. Furthermore, by coupling CATS with cross-lingual word embeddings, we demonstrate its effectiveness in zero-shot language transfer: it can successfully segment texts in languages unseen in training.

Comments:	AAAI 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2001.00891 [cs.CL]
	(or arXiv:2001.00891v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2001.00891

Submission history

From: Goran Glavaš [view email]
[v1] Fri, 3 Jan 2020 17:06:41 UTC (583 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-01

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Goran Glavas

export BibTeX citation

Computer Science > Computation and Language

Title:Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators