Explaining Vision and Language through Graphs of Events in Space and Time

Masala, Mihai; Cudlenco, Nicolae; Rebedea, Traian; Leordeanu, Marius

Computer Science > Artificial Intelligence

arXiv:2309.08612 (cs)

[Submitted on 29 Aug 2023]

Title:Explaining Vision and Language through Graphs of Events in Space and Time

Authors:Mihai Masala, Nicolae Cudlenco, Traian Rebedea, Marius Leordeanu

View PDF

Abstract:Artificial Intelligence makes great advances today and starts to bridge the gap between vision and language. However, we are still far from understanding, explaining and controlling explicitly the visual content from a linguistic perspective, because we still lack a common explainable representation between the two domains. In this work we come to address this limitation and propose the Graph of Events in Space and Time (GEST), by which we can represent, create and explain, both visual and linguistic stories. We provide a theoretical justification of our model and an experimental validation, which proves that GEST can bring a solid complementary value along powerful deep learning models. In particular, GEST can help improve at the content-level the generation of videos from text, by being easily incorporated into our novel video generation engine. Additionally, by using efficient graph matching techniques, the GEST graphs can also improve the comparisons between texts at the semantic level.

Comments:	Accepted at IEEE International Conference on Computer Vision (ICCV) 2023 Workshops: 5th Workshop On Closing The Loop Between Vision And Language
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.08612 [cs.AI]
	(or arXiv:2309.08612v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2309.08612

Submission history

From: Mihai Masala [view email]
[v1] Tue, 29 Aug 2023 07:25:06 UTC (33,954 KB)

Computer Science > Artificial Intelligence

Title:Explaining Vision and Language through Graphs of Events in Space and Time

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Explaining Vision and Language through Graphs of Events in Space and Time

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators