Self-Play Learning Without a Reward Metric

Schmidt, Dan; Moran, Nick; Rosenfeld, Jonathan S.; Rosenthal, Jonathan; Yedidia, Jonathan

Computer Science > Machine Learning

arXiv:1912.07557 (cs)

[Submitted on 16 Dec 2019]

Title:Self-Play Learning Without a Reward Metric

Authors:Dan Schmidt, Nick Moran, Jonathan S. Rosenfeld, Jonathan Rosenthal, Jonathan Yedidia

View PDF

Abstract:The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the AlphaZero algorithm that requires only a total ordering over game outcomes, obviating the need to perform any quantitative balancing of reward components. We demonstrate that this system learns optimal play in a comparable amount of time to AlphaZero on a sample game.

Comments:	6 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1912.07557 [cs.LG]
	(or arXiv:1912.07557v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1912.07557

Submission history

From: Dan Schmidt [view email]
[v1] Mon, 16 Dec 2019 18:11:14 UTC (40 KB)

Computer Science > Machine Learning

Title:Self-Play Learning Without a Reward Metric

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Self-Play Learning Without a Reward Metric

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators