Pajarinen, Joni ; Thai, Hong Linh ; Akrour, Riad ; Peters, Jan ; Neumann, Gerhard (2022)
Compatible natural gradient policy search.
In: Machine Learning, 2022, 108 (8-9)
doi: 10.26083/tuprints-00020531
Article, Secondary publication, Publisher's Version
Text
Pajarinen2019_Article_CompatibleNaturalGradientPolic.pdf Copyright Information: CC BY 4.0 International - Creative Commons, Attribution. Download (7MB) |
Item Type: | Article |
---|---|
Type of entry: | Secondary publication |
Title: | Compatible natural gradient policy search |
Language: | English |
Date: | 2022 |
Place of Publication: | Darmstadt |
Year of primary publication: | 2022 |
Publisher: | Springer |
Journal or Publication Title: | Machine Learning |
Volume of the journal: | 108 |
Issue Number: | 8-9 |
DOI: | 10.26083/tuprints-00020531 |
Corresponding Links: | |
Origin: | Secondary publication service |
Abstract: | Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks. |
Status: | Publisher's Version |
URN: | urn:nbn:de:tuda-tuprints-205319 |
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 600 Technology, medicine, applied sciences > 600 Technology |
Divisions: | 20 Department of Computer Science > Intelligent Autonomous Systems |
TU-Projects: | EC/H2020|640554|SKILLS4ROBOTS |
Date Deposited: | 10 Feb 2022 13:10 |
Last Modified: | 24 Mar 2023 07:30 |
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/20531 |
PPN: | 506259315 |
Export: |
View Item |