Towards Accurate Generative Models of Video: A New Metric & Challenges

Unterthiner, Thomas; van Steenkiste, Sjoerd; Kurach, Karol; Marinier, Raphael; Michalski, Marcin; Gelly, Sylvain

Computer Science > Computer Vision and Pattern Recognition

arXiv:1812.01717 (cs)

[Submitted on 3 Dec 2018 (v1), last revised 27 Mar 2019 (this version, v2)]

Title:Towards Accurate Generative Models of Video: A New Metric & Challenges

Authors:Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, Sylvain Gelly

View PDF

Abstract:Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presentation of objects. While recent attempts at formulating generative models of video have had some success, current progress is hampered by (1) the lack of qualitative metrics that consider visual quality, temporal coherence, and diversity of samples, and (2) the wide gap between purely synthetic video data sets and challenging real-world data sets in terms of complexity. To this extent we propose Fréchet Video Distance (FVD), a new metric for generative models of video, and StarCraft 2 Videos (SCV), a benchmark of game play from custom starcraft 2 scenarios that challenge the current capabilities of generative models of video. We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1812.01717 [cs.CV]
	(or arXiv:1812.01717v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1812.01717

Submission history

From: Sjoerd van Steenkiste [view email]
[v1] Mon, 3 Dec 2018 03:57:42 UTC (7,133 KB)
[v2] Wed, 27 Mar 2019 16:43:17 UTC (7,123 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Accurate Generative Models of Video: A New Metric & Challenges

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Accurate Generative Models of Video: A New Metric & Challenges

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators