AudioViewer: Learning to Visualize Sounds

Song, Chunjin; Zhang, Yuchi; Peng, Willis; Mohaghegh, Parmis; Wandt, Bastian; Rhodin, Helge

Computer Science > Human-Computer Interaction

arXiv:2012.13341 (cs)

[Submitted on 22 Dec 2020 (v1), last revised 10 Nov 2022 (this version, v5)]

Title:AudioViewer: Learning to Visualize Sounds

Authors:Chunjin Song, Yuchi Zhang, Willis Peng, Parmis Mohaghegh, Bastian Wandt, Helge Rhodin

View PDF

Abstract:A long-standing goal in the field of sensory substitution is to enable sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speech. Since such a substitution is artificial, without labels for supervised learning, our core contribution is to build a mapping from audio to video that learns from unpaired examples via high-level constraints. For speech, we additionally disentangle content from style, such as gender and dialect. Qualitative and quantitative results, including a human study, demonstrate that our unpaired translation approach maintains important audio features in the generated video and that videos of faces and numbers are well suited for visualizing high-dimensional audio features that can be parsed by humans to match and distinguish between sounds and words. Code and models are available at this https URL

Subjects:	Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2012.13341 [cs.HC]
	(or arXiv:2012.13341v5 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2012.13341
Journal reference:	Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2206-2216

Submission history

From: Yuchi Zhang [view email]
[v1] Tue, 22 Dec 2020 21:52:45 UTC (6,884 KB)
[v2] Mon, 28 Dec 2020 21:35:09 UTC (5,206 KB)
[v3] Thu, 11 Mar 2021 19:51:23 UTC (110,990 KB)
[v4] Fri, 3 Dec 2021 08:31:19 UTC (19,929 KB)
[v5] Thu, 10 Nov 2022 06:33:29 UTC (14,747 KB)

Computer Science > Human-Computer Interaction

Title:AudioViewer: Learning to Visualize Sounds

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:AudioViewer: Learning to Visualize Sounds

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators