Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

Mirzaei, Shokoufeh; Arzate, Jesse; Vijay, Yukti

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2503.22692 (eess)

[Submitted on 13 Mar 2025]

Title:Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

Authors:Shokoufeh Mirzaei, Jesse Arzate, Yukti Vijay

View PDF

Abstract:Transcription of aviation communications has several applications, from assisting air traffic controllers in identifying the accuracy of read-back errors to search and rescue operations. Recent advances in artificial intelligence have provided unprecedented opportunities for improving aviation communication transcription tasks. OpenAI's Whisper is one of the leading automatic speech recognition models. However, fine-tuning Whisper for aviation communication transcription is not computationally efficient. Thus, this paper aims to use a Parameter-Efficient Fine-tuning method called Low-Rank Adaptation to fine-tune a more computationally efficient version of Whisper, distil-Whisper. To perform the fine-tuning, we used the Air Traffic Control Corpus dataset from the Linguistic Data Consortium, which contains approximately 70 hours of controller and pilot transmissions near three major airports in the US. The objective was to reduce the word error rate to enhance accuracy in the transcription of aviation communication. First, starting with an initial set of hyperparameters for LoRA (Alpha = 64 and Rank = 32), we performed a grid search. We applied a 5-fold cross-validation to find the best combination of distil-Whisper hyperparameters. Then, we fine-tuned the model for LoRA hyperparameters, achieving an impressive average word error rate of 3.86% across five folds. This result highlights the model's potential for use in the cockpit.

Comments:	14 pages, 4 Figures, 4 Tables, Under review by Journal of Aerospace Information Systems
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2503.22692 [eess.AS]
	(or arXiv:2503.22692v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2503.22692

Submission history

From: Shokoufeh Mirzaei [view email]
[v1] Thu, 13 Mar 2025 22:12:45 UTC (342 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators