Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Higuchi, Takuya; Brueggeman, Avamarie; Delfarah, Masood; Shum, Stephen

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2309.16036 (eess)

[Submitted on 27 Sep 2023 (v1), last revised 14 Feb 2024 (this version, v2)]

Title:Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Authors:Takuya Higuchi, Avamarie Brueggeman, Masood Delfarah, Stephen Shum

View PDF HTML (experimental)

Abstract:Voice triggering (VT) enables users to activate their devices by just speaking a trigger phrase. A front-end system is typically used to perform speech enhancement and/or separation, and produces multiple enhanced and/or separated signals. Since conventional VT systems take only single-channel audio as input, channel selection is performed. A drawback of this approach is that unselected channels are discarded, even if the discarded channels could contain useful information for VT. In this work, we propose multichannel acoustic models for VT, where the multichannel output from the frond-end is fed directly into a VT model. We adopt a transform-average-concatenate (TAC) block and modify the TAC block by incorporating the channel from the conventional channel selection so that the model can attend to a target speaker when multiple speakers are present. The proposed approach achieves up to 30% reduction in the false rejection rate compared to the baseline channel selection approach.

Comments:	Accepted at HSCMA 2024
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2309.16036 [eess.AS]
	(or arXiv:2309.16036v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2309.16036

Submission history

From: Takuya Higuchi [view email]
[v1] Wed, 27 Sep 2023 21:28:50 UTC (303 KB)
[v2] Wed, 14 Feb 2024 00:28:50 UTC (303 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators