AudioViewer: Learning to Visualize Sound

Zhang, Yuchi; Peng, Willis; Wandt, Bastian; Rhodin, Helge

Computer Science > Human-Computer Interaction

arXiv:2012.13341v3 (cs)

[Submitted on 22 Dec 2020 (v1), revised 11 Mar 2021 (this version, v3), latest version 10 Nov 2022 (v5)]

Title:AudioViewer: Learning to Visualize Sound

Authors:Yuchi Zhang, Willis Peng, Bastian Wandt, Helge Rhodin

View PDF

Abstract:Sensory substitution can help persons with perceptual deficits. In this work, we attempt to visualize audio with video. Our long-term goal is to create sound perception for hearing impaired people, for instance, to facilitate feedback for training deaf speech. Different from existing models that translate between speech and text or text and images, we target an immediate and low-level translation that applies to generic environment sounds and human speech without delay. No canonical mapping is known for this artificial translation task. Our design is to translate from audio to video by compressing both into a common latent space with shared structure. Our core contribution is the development and evaluation of learned mappings that respect human perception limits and maximize user comfort by enforcing priors and combining strategies from unpaired image translation and disentanglement. We demonstrate qualitatively and quantitatively that our AudioViewer model maintains important audio features in the generated video and that generated videos of faces and numbers are well suited for visualizing high-dimensional audio features since they can easily be parsed by humans to match and distinguish between sounds, words, and speakers.

Subjects:	Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2012.13341 [cs.HC]
	(or arXiv:2012.13341v3 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2012.13341

Submission history

From: Yuchi Zhang [view email]
[v1] Tue, 22 Dec 2020 21:52:45 UTC (6,884 KB)
[v2] Mon, 28 Dec 2020 21:35:09 UTC (5,206 KB)
[v3] Thu, 11 Mar 2021 19:51:23 UTC (110,990 KB)
[v4] Fri, 3 Dec 2021 08:31:19 UTC (19,929 KB)
[v5] Thu, 10 Nov 2022 06:33:29 UTC (14,747 KB)

Computer Science > Human-Computer Interaction

Title:AudioViewer: Learning to Visualize Sound

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:AudioViewer: Learning to Visualize Sound

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators