Logo des Repositoriums
  • English
  • Deutsch
Anmelden
Keine TU-ID? Klicken Sie hier für mehr Informationen.
  1. Startseite
  2. Publikationen
  3. Publikationen der Technischen Universität Darmstadt
  4. Zweitveröffentlichungen (aus DeepGreen)
  5. AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong
 
  • Details
2023
Zweitveröffentlichung
Artikel
Verlagsversion

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

File(s)
Download
Hauptpublikation
frai-06-1014561.pdf
CC BY 4.0 International
Format: Adobe PDF
Size: 1.3 MB
TUDa URI
tuda/10637
URN
urn:nbn:de:tuda-tuprints-240643
DOI
10.26083/tuprints-00024064
Autor:innen
Blüml, Jannis
Czech, Johannes
Kersting, Kristian ORCID 0000-0002-2873-9152
Kurzbeschreibung (Abstract)

In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information — a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect.

Freie Schlagworte

imperfect information...

deep neural networks

reinforcement learnin...

AlphaZero

Monte-Carlo tree sear...

perfect information M...

Sprache
Englisch
Fachbereich/-gebiet
20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen
Zentrale Einrichtungen > Centre for Cognitive Science (CCS)
Zentrale Einrichtungen > hessian.AI - Hessisches Zentrum für Künstliche Intelligenz
DDC
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Institution
Universitäts- und Landesbibliothek Darmstadt
Ort
Darmstadt
Titel der Zeitschrift / Schriftenreihe
Frontiers in Artificial Intelligence
Jahrgang der Zeitschrift
6
ISSN
2624-8212
Verlag
Frontiers Media S.A.
Publikationsjahr der Erstveröffentlichung
2023
Verlags-DOI
10.3389/frai.2023.1014561
PPN
511988958

  • TUprints Leitlinien
  • Cookie-Einstellungen
  • Impressum
  • Datenschutzbestimmungen
  • Webseitenanalyse
Diese Webseite wird von der Universitäts- und Landesbibliothek Darmstadt (ULB) betrieben.