Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Sun, Guangzhi; Zhang, Chao; Woodland, Philip C.

Computer Science > Sound

arXiv:2207.00857 (cs)

[Submitted on 2 Jul 2022]

Title:Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Authors:Guangzhi Sun, Chao Zhang, Philip C. Woodland

View PDF

Abstract:Incorporating biasing words obtained as contextual knowledge is critical for many automatic speech recognition (ASR) applications. This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator (TCPGen) component for end-to-end contextual ASR. By encoding the biasing words in the prefix-tree with a tree-based GNN, lookahead for future wordpieces in end-to-end ASR decoding is achieved at each tree node by incorporating information about all wordpieces on the tree branches rooted from it, which allows a more accurate prediction of the generation probability of the biasing words. Systems were evaluated on the Librispeech corpus using simulated biasing tasks, and on the AMI corpus by proposing a novel visual-grounded contextual ASR pipeline that extracts biasing words from slides alongside each meeting. Results showed that TCPGen with GNN encodings achieved about a further 15% relative WER reduction on the biasing words compared to the original TCPGen, with a negligible increase in the computation cost for decoding.

Comments:	To appear in Interspeech 2022. arXiv admin note: text overlap with arXiv:2205.09058
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2207.00857 [cs.SD]
	(or arXiv:2207.00857v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2207.00857

Submission history

From: Guangzhi Sun [view email]
[v1] Sat, 2 Jul 2022 15:12:18 UTC (364 KB)

Computer Science > Sound

Title:Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators