On the Learning of Non-Autoregressive Transformers

Huang, Fei; Tao, Tianhua; Zhou, Hao; Li, Lei; Huang, Minlie

Computer Science > Computation and Language

arXiv:2206.05975 (cs)

[Submitted on 13 Jun 2022]

Title:On the Learning of Non-Autoregressive Transformers

Authors:Fei Huang, Tianhua Tao, Hao Zhou, Lei Li, Minlie Huang

View PDF

Abstract:Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. First, we show that simply training NAT by maximizing the likelihood can lead to an approximation of marginal distributions but drops all dependencies between tokens, where the dropped information can be measured by the dataset's conditional total correlation. Second, we formalize many previous objectives in a unified framework and show that their success can be concluded as maximizing the likelihood on a proxy distribution, leading to a reduced information loss. Empirical studies show that our perspective can explain the phenomena in NAT learning and guide the design of new training methods.

Comments:	accepted at ICML2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2206.05975 [cs.CL]
	(or arXiv:2206.05975v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2206.05975

Submission history

From: Fei Huang [view email]
[v1] Mon, 13 Jun 2022 08:42:09 UTC (609 KB)

Computer Science > Computation and Language

Title:On the Learning of Non-Autoregressive Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Learning of Non-Autoregressive Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators