Massively Scaling Explicit Policy-conditioned Value Functions

Bohlinger, Nico; Peters, Jan

Computer Science > Machine Learning

arXiv:2502.11949 (cs)

[Submitted on 17 Feb 2025]

Title:Massively Scaling Explicit Policy-conditioned Value Functions

Authors:Nico Bohlinger, Jan Peters

View PDF HTML (experimental)

Abstract:We introduce a scaling strategy for Explicit Policy-Conditioned Value Functions (EPVFs) that significantly improves performance on challenging continuous-control tasks. EPVFs learn a value function V({\theta}) that is explicitly conditioned on the policy parameters, enabling direct gradient-based updates to the parameters of any policy. However, EPVFs at scale struggle with unrestricted parameter growth and efficient exploration in the policy parameter space. To address these issues, we utilize massive parallelization with GPU-based simulators, big batch sizes, weight clipping and scaled peturbations. Our results show that EPVFs can be scaled to solve complex tasks, such as a custom Ant environment, and can compete with state-of-the-art Deep Reinforcement Learning (DRL) baselines like Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). We further explore action-based policy parameter representations from previous work and specialized neural network architectures to efficiently handle weight-space features, which have not been used in the context of DRL before.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.11949 [cs.LG]
	(or arXiv:2502.11949v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.11949

Submission history

From: Nico Bohlinger [view email]
[v1] Mon, 17 Feb 2025 16:02:54 UTC (4,854 KB)

Computer Science > Machine Learning

Title:Massively Scaling Explicit Policy-conditioned Value Functions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Massively Scaling Explicit Policy-conditioned Value Functions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators