Adaptive Orchestration for Inference of Large Foundation Models at the Edge

Koch, Fernando; Djuhera, Aladin; Binotto, Alecio

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2504.03668 (cs)

[Submitted on 19 Mar 2025]

Title:Adaptive Orchestration for Inference of Large Foundation Models at the Edge

Authors:Fernando Koch, Aladin Djuhera, Alecio Binotto

View PDF HTML (experimental)

Abstract:Large Foundation Models (LFMs), including multi-modal and generative AI models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments presents significant challenges for workload orchestration. We propose a novel adaptive orchestration method and system tailored specifically for managing distributed inference workloads across multi-access edge computing (MEC) infrastructures. Our approach enhances traditional workload orchestration by introducing dynamic methods including: (1) adaptive workload distribution that selects optimal, inter-connected edge nodes based on runtime capacity profiling; (2) dynamic redistribution of LFM partitions as operational conditions evolve, and; (3) real-time reconfiguration (e.g., re-splitting) of LFM layers to balance performance and privacy requirements. Our proposed framework introduces an architecture for adaptive split inference, enabling real-time, QoS-aware management of inference workloads. We present a reference architecture, detail operational mechanisms, and demonstrate its application through various use cases in real-world scenarios.

Comments:	9 pages
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2504.03668 [cs.DC]
	(or arXiv:2504.03668v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2504.03668

Submission history

From: Fernando Koch [view email]
[v1] Wed, 19 Mar 2025 15:35:56 UTC (639 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Adaptive Orchestration for Inference of Large Foundation Models at the Edge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Adaptive Orchestration for Inference of Large Foundation Models at the Edge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators