Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

Blatt, Alexander; Krishnan, Aravind; Klakow, Dietrich

Computer Science > Computation and Language

arXiv:2406.13842 (cs)

[Submitted on 19 Jun 2024]

Title:Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

Authors:Alexander Blatt, Aravind Krishnan, Dietrich Klakow

View PDF HTML (experimental)

Abstract:Utilizing air-traffic control (ATC) data for downstream natural-language processing tasks requires preprocessing steps. Key steps are the transcription of the data via automatic speech recognition (ASR) and speaker diarization, respectively speaker role detection (SRD) to divide the transcripts into pilot and air-traffic controller (ATCO) transcripts. While traditional approaches take on these tasks separately, we propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture. We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets. Our study shows in which cases our joint system can outperform the two traditional approaches and in which cases the other architectures are preferable. We additionally evaluate how acoustic and lexical differences influence all architectures and show how to overcome them for our joint architecture.

Comments:	Accepted at Interspeech 2024
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2406.13842 [cs.CL]
	(or arXiv:2406.13842v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.13842

Submission history

From: Aravind Krishnan [view email]
[v1] Wed, 19 Jun 2024 21:11:01 UTC (2,313 KB)

Computer Science > Computation and Language

Title:Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators