Simpler near-optimal controllers through direct supervision

Tweed, Douglas

Abstract: The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a powerful way of creating near-optimal controllers by learning. It is based on the fact that if we have a feedback controller, and we learn to compute the gradient grad-J of its cost-to-go function, then we can use that gradient to define a better controller. We can then use the new controller's grad-J to define a still-better controller, and so on. Here I point out that GHJB works indirectly in the sense that it doesn't learn the best approximation to grad-J but instead learns the time derivative dJ/dt, and infers grad-J from that. I show that we can get simpler and lower-cost controllers by learning grad-J directly. To do this, we need teaching signals that report grad-J(x) for a varied set of states x. I show how to obtain these signals, using the GHJB equation to calculate one component of grad-J(x) -- the one parallel with dx/dt -- and computing all the other components by backward-in-time integration, using a formula similar to the Euler-Lagrange equation. I then compare this direct algorithm with GHJB on 2 test problems.

Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:0908.2859 [math.OC]
	(or arXiv:0908.2859v1 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.0908.2859

Mathematics > Optimization and Control

Title:Simpler near-optimal controllers through direct supervision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators