Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Saqur, Raeid; Kratsios, Anastasis; Krach, Florian; Limmer, Yannick; Tian, Jacob-Junqi; Willes, John; Horvath, Blanka; Rudzicz, Frank

Computer Science > Machine Learning

arXiv:2406.02969 (cs)

[Submitted on 5 Jun 2024]

Title:Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Authors:Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz

View PDF HTML (experimental)

Abstract:We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the $N$ filter outputs are aggregated to optimize a lower bound for the loss of the aggregated LLMs, which can be optimized in closed-form, thus generating our ensemble predictor. Our contributions here are: (I) the MoE-F algorithm -- deployable as a plug-and-play filtering harness, (II) theoretical optimality guarantees of the proposed filtering-based gating algorithm, and (III) empirical evaluation and ablative results using state of the art foundational and MoE LLMs on a real-world Financial Market Movement task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert.

Comments:	29 pages, 5 Appendix sections
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computational Finance (q-fin.CP); Mathematical Finance (q-fin.MF)
MSC classes:	60J05, 60G35, 68T20, 68T42, 68T50
ACM classes:	I.2.6; I.2.7; G.3
Cite as:	arXiv:2406.02969 [cs.LG]
	(or arXiv:2406.02969v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.02969

Submission history

From: Raeid Saqur [view email]
[v1] Wed, 5 Jun 2024 05:53:50 UTC (916 KB)

Computer Science > Machine Learning

Title:Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators