Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Zhang, Ziqi; Shi, Yaya; Yuan, Chunfeng; Li, Bing; Wang, Peijin; Hu, Weiming; Zha, Zhengjun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2002.11566 (cs)

[Submitted on 26 Feb 2020]

Title:Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Authors:Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha

View PDF

Abstract:Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the ground-truth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.

Comments:	Accepted by CVPR 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2002.11566 [cs.CV]
	(or arXiv:2002.11566v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2002.11566

Submission history

From: Ziqi Zhang [view email]
[v1] Wed, 26 Feb 2020 15:34:52 UTC (592 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2020-02

Change to browse by:

cs.CL
cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ziqi Zhang
Chunfeng Yuan
Bing Li
Weiming Hu
Zhengjun Zha

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators