GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

Singh, Bharat; Kulharia, Viveka; Yang, Luyu; Ravichandran, Avinash; Tyagi, Ambrish; Shrivastava, Ashish

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.10722 (cs)

[Submitted on 15 Jun 2024]

Title:GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

Authors:Bharat Singh, Viveka Kulharia, Luyu Yang, Avinash Ravichandran, Ambrish Tyagi, Ashish Shrivastava

View PDF HTML (experimental)

Abstract:Multimodal synthetic data generation is crucial in domains such as autonomous driving, robotics, augmented/virtual reality, and retail. We propose a novel approach, GenMM, for jointly editing RGB videos and LiDAR scans by inserting temporally and geometrically consistent 3D objects. Our method uses a reference image and 3D bounding boxes to seamlessly insert and blend new objects into target videos. We inpaint the 2D Regions of Interest (consistent with 3D boxes) using a diffusion-based video inpainting model. We then compute semantic boundaries of the object and estimate it's surface depth using state-of-the-art semantic segmentation and monocular depth estimation techniques. Subsequently, we employ a geometry-based optimization algorithm to recover the 3D shape of the object's surface, ensuring it fits precisely within the 3D bounding box. Finally, LiDAR rays intersecting with the new object surface are updated to reflect consistent depths with its geometry. Our experiments demonstrate the effectiveness of GenMM in inserting various 3D objects across video and LiDAR modalities.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2406.10722 [cs.CV]
	(or arXiv:2406.10722v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.10722

Submission history

From: Ashish Shrivastava [view email]
[v1] Sat, 15 Jun 2024 19:29:01 UTC (42,038 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators