Adversarial Policies Beat Professional-Level Go AIs

Wang, Tony Tong; Gleave, Adam; Belrose, Nora; Tseng, Tom; Miller, Joseph; Dennis, Michael D; Duan, Yawen; Pogrebniak, Viktor; Levine, Sergey; Russell, Stuart

Computer Science > Machine Learning

arXiv:2211.00241v1 (cs)

[Submitted on 1 Nov 2022 (this version), latest version 13 Jul 2023 (v4)]

Title:Adversarial Policies Beat Professional-Level Go AIs

Authors:Tony Tong Wang, Adam Gleave, Nora Belrose, Tom Tseng, Joseph Miller, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

View PDF

Abstract:We attack the state-of-the-art Go-playing AI system, KataGo, by training an adversarial policy that plays against a frozen KataGo victim. Our attack achieves a >99% win-rate against KataGo without search, and a >50% win-rate when KataGo uses enough search to be near-superhuman. To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional. Notably, the adversary does not win by learning to play Go better than KataGo -- in fact, the adversary is easily beaten by human amateurs. Instead, the adversary wins by tricking KataGo into ending the game prematurely at a point that is favorable to the adversary. Our results demonstrate that even professional-level AI systems may harbor surprising failure modes. See this https URL for example games.

Comments:	21 pages, 11 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
ACM classes:	I.2.6
Cite as:	arXiv:2211.00241 [cs.LG]
	(or arXiv:2211.00241v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.00241

Submission history

From: Adam Gleave [view email]
[v1] Tue, 1 Nov 2022 03:13:20 UTC (838 KB)
[v2] Mon, 9 Jan 2023 19:53:05 UTC (6,054 KB)
[v3] Sat, 18 Feb 2023 22:05:01 UTC (6,849 KB)
[v4] Thu, 13 Jul 2023 06:37:29 UTC (4,698 KB)

Computer Science > Machine Learning

Title:Adversarial Policies Beat Professional-Level Go AIs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adversarial Policies Beat Professional-Level Go AIs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators