Improving Locomotion Learning Efficiency of CPG-RBF Networks under Morphological Damage with Multiple Value Functions
Improving Locomotion Learning Efficiency of CPG-RBF Networks under Morphological Damage with Multiple Value Functions
The combination of reinforcement learning (RL) and central pattern generators (CPGs) has been proven to be useful for learning fast locomotion, thanks to a strong inductive bias in the form of periodic input signals, which lead to consistent and stable periodic gait patterns. In this work, we further investigate what modifications in the RL-CPG pipeline can facilitate even faster locomotion learning. A recent study in the field of multi-agent reinforcement learning (MARL) by Hu et al. suggests that using a noisy value function can lead to better exploration and help avoid local optima. Inspired by this, in this work, we propose to further improve training efficiency by utilizing multiple value functions—emulating the effect of a noisy value function—to enhance the exploration of CPG-RBF networks. Specifically, we test the training efficiency of CPG-RBF networks under morphological damage, which requires more sophisticated exploration to discover asymmetric, yet effective, gait patterns. We empirically show that, with multiple value functions (also known as critic networks), CPG-RBF networks consistently learn faster, especially under morphological damage, compared to using a single value function.

