Multiple Sound Sources Localization from Coarse to Fine

Qian, Rui; Hu, Di; Dinkel, Heinrich; Wu, Mengyue; Xu, Ning; Lin, Weiyao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.06355 (cs)

[Submitted on 13 Jul 2020 (v1), last revised 14 Jul 2020 (this version, v2)]

Title:Multiple Sound Sources Localization from Coarse to Fine

Authors:Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin

View PDF

Abstract:How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations. To solve this problem, we develop a two-stage audiovisual learning framework that disentangles audio and visual representations of different categories from complex scenes, then performs cross-modal feature alignment in a coarse-to-fine manner. Our model achieves state-of-the-art results on public dataset of localization, as well as considerable performance on multi-source sound localization in complex scenes. We then employ the localization results for sound separation and obtain comparable performance to existing methods. These outcomes demonstrate our model's ability in effectively aligning sounds with specific visual sources. Code is available at this https URL

Comments:	to appear in ECCV 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2007.06355 [cs.CV]
	(or arXiv:2007.06355v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.06355

Submission history

From: Rui Qian [view email]
[v1] Mon, 13 Jul 2020 12:59:40 UTC (5,135 KB)
[v2] Tue, 14 Jul 2020 13:38:52 UTC (5,135 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
Ning Xu

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Multiple Sound Sources Localization from Coarse to Fine

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multiple Sound Sources Localization from Coarse to Fine

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators