SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Ning, Zheng; Wimer, Brianna; Jiang, Kaiwen; Chen, Keyi; Ban, Jerrick; Tian, Yapeng; Zhao, Yuhang; Li, Toby

Computer Science > Human-Computer Interaction

arXiv:2402.07300v1 (cs)

[Submitted on 11 Feb 2024 (this version), latest version 27 Feb 2024 (v2)]

Title:SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Authors:Zheng Ning, Brianna Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Li

View PDF

Abstract:Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.

Subjects:	Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
Cite as:	arXiv:2402.07300 [cs.HC]
	(or arXiv:2402.07300v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2402.07300

Submission history

From: Zheng Ning [view email]
[v1] Sun, 11 Feb 2024 20:42:01 UTC (3,854 KB)
[v2] Tue, 27 Feb 2024 03:36:01 UTC (18,832 KB)

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Human-Computer Interaction

Title:SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Human-Computer Interaction

Title:SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators