Valeo4Cast: A Modular Approach to End-to-End Forecasting

Xu, Yihong; Zablocki, Éloi; Boulch, Alexandre; Puy, Gilles; Chen, Mickael; Bartoccioni, Florent; Samet, Nermin; Siméoni, Oriane; Gidaris, Spyros; Vu, Tuan-Hung; Bursuc, Andrei; Valle, Eduardo; Marlet, Renaud; Cord, Matthieu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.08113 (cs)

[Submitted on 12 Jun 2024 (v1), last revised 26 Sep 2024 (this version, v3)]

Title:Valeo4Cast: A Modular Approach to End-to-End Forecasting

Authors:Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, Matthieu Cord

View PDF HTML (experimental)

Abstract:Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect and track from sensor data (cameras or LiDARs) the past trajectories of the different elements of the scene and predict their future locations. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting, and instead use a modular approach. We individually build and train detection, tracking and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. We conduct an in-depth study on the finetuning strategies and it reveals that our simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 End-to-end Forecasting Challenge, with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts. The code, model weights and results are made available this https URL.

Comments:	Winning solution of the Argoverse 2 "Unified Detection, Tracking, and Forecasting" challenge; work accepted at Road++ ECCVW 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2406.08113 [cs.CV]
	(or arXiv:2406.08113v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.08113

Submission history

From: Yihong Xu [view email]
[v1] Wed, 12 Jun 2024 11:50:51 UTC (2,185 KB)
[v2] Tue, 10 Sep 2024 13:41:00 UTC (4,184 KB)
[v3] Thu, 26 Sep 2024 16:14:54 UTC (4,184 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Valeo4Cast: A Modular Approach to End-to-End Forecasting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Valeo4Cast: A Modular Approach to End-to-End Forecasting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators