DeepStory: Video Story QA by Deep Embedded Memory Networks

Kim, Kyung-Min; Heo, Min-Oh; Choi, Seong-Ho; Zhang, Byoung-Tak

Computer Science > Computer Vision and Pattern Recognition

arXiv:1707.00836 (cs)

[Submitted on 4 Jul 2017]

Title:DeepStory: Video Story QA by Deep Embedded Memory Networks

Authors:Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, Byoung-Tak Zhang

View PDF

Abstract:Question-answering (QA) on video contents is a significant challenge for achieving human-level intelligence as it involves both vision and language in real-world settings. Here we demonstrate the possibility of an AI agent performing video story QA by learning from a large amount of cartoon videos. We develop a video-story learning model, i.e. Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scene-dialogue video stream using a latent embedding space of observed data. The video stories are stored in a long-term memory component. For a given question, an LSTM-based attention model uses the long-term memory to recall the best question-story-answer triplet by focusing on specific words containing key information. We trained the DEMN on a novel QA dataset of children's cartoon video series, Pororo. The dataset contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained sentences for scene description, and 8,913 story-related QA pairs. Our experimental results show that the DEMN outperforms other QA models. This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention. DEMN also achieved state-of-the-art results on the MovieQA benchmark.

Comments:	7 pages, accepted for IJCAI 2017
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:1707.00836 [cs.CV]
	(or arXiv:1707.00836v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1707.00836

Submission history

From: Kyungmin Kim [view email]
[v1] Tue, 4 Jul 2017 07:42:05 UTC (433 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DeepStory: Video Story QA by Deep Embedded Memory Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DeepStory: Video Story QA by Deep Embedded Memory Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators