How to Merge Your Multimodal Models Over Time?

Dziadzio, Sebastian; Udandarao, Vishaal; Roth, Karsten; Prabhu, Ameya; Akata, Zeynep; Albanie, Samuel; Bethge, Matthias

Computer Science > Machine Learning

arXiv:2412.06712 (cs)

[Submitted on 9 Dec 2024]

Title:How to Merge Your Multimodal Models Over Time?

Authors:Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth, Ameya Prabhu, Zeynep Akata, Samuel Albanie, Matthias Bethge

View PDF

Abstract:Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME - Temporal Integration of Model Expertise - which defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to uncover key insights for temporal model merging, offering a better understanding of current challenges and best practices for effective temporal model merging.

Comments:	Technical Report. Code at this https URL
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.06712 [cs.LG]
	(or arXiv:2412.06712v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.06712

Submission history

From: Karsten Roth [view email]
[v1] Mon, 9 Dec 2024 18:01:13 UTC (3,132 KB)

Computer Science > Machine Learning

Title:How to Merge Your Multimodal Models Over Time?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:How to Merge Your Multimodal Models Over Time?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators