Leveraging Video Descriptions to Learn Video Question Answering

Zeng, Kuo-Hao; Chen, Tseng-Hung; Chuang, Ching-Yao; Liao, Yuan-Hong; Niebles, Juan Carlos; Sun, Min

Computer Science > Computer Vision and Pattern Recognition

arXiv:1611.04021 (cs)

[Submitted on 12 Nov 2016 (v1), last revised 19 Dec 2016 (this version, v2)]

Title:Leveraging Video Descriptions to Learn Video Question Answering

Authors:Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun

View PDF

Abstract:We propose a scalable approach to learn video-based question answering (QA): answer a "free-form natural language question" about a video content. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended fromMN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.

Comments:	7 pages, 5 figures. Accepted to AAAI 2017. Camera-ready version
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Cite as:	arXiv:1611.04021 [cs.CV]
	(or arXiv:1611.04021v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1611.04021

Submission history

From: Tseng-Hung Chen [view email]
[v1] Sat, 12 Nov 2016 17:15:57 UTC (1,315 KB)
[v2] Mon, 19 Dec 2016 16:07:33 UTC (2,585 KB)

Monday, May 5: arXiv will be READ ONLY at 9:00AM EST for approximately 30 minutes. We apologize for any inconvenience.

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Video Descriptions to Learn Video Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Video Descriptions to Learn Video Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators