TURL: Table Understanding through Representation Learning

Deng, Xiang; Sun, Huan; Lees, Alyssa; Wu, You; Yu, Cong

Computer Science > Information Retrieval

arXiv:2006.14806 (cs)

[Submitted on 26 Jun 2020 (v1), last revised 3 Dec 2020 (this version, v2)]

Title:TURL: Table Understanding through Representation Learning

Authors:Xiang Deng, Huan Sun, Alyssa Lees, You Wu, Cong Yu

View PDF

Abstract:Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heavily-engineered task-specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables. During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner. Its universal model design with pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning. Specifically, we propose a structure-aware Transformer encoder to model the row-column structure of relational tables, and present a new Masked Entity Recovery (MER) objective for pre-training to capture the semantics and knowledge in large-scale unlabeled data. We systematically evaluate TURL with a benchmark consisting of 6 different tasks for table understanding (e.g., relation extraction, cell filling). We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.

Comments:	Accepted to VLDB 2021. Extended version with experiments added during revision. Our source code, benchmark, as well as pre-trained models will be available on this https URL
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2006.14806 [cs.IR]
	(or arXiv:2006.14806v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2006.14806

Submission history

From: Xiang Deng [view email]
[v1] Fri, 26 Jun 2020 05:44:54 UTC (1,872 KB)
[v2] Thu, 3 Dec 2020 02:47:41 UTC (1,988 KB)

Computer Science > Information Retrieval

Title:TURL: Table Understanding through Representation Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:TURL: Table Understanding through Representation Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators