Lauri, Mikko ; Pajarinen, Joni ; Peters, Jan (2024)
Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement.
In: Autonomous Agents and Multi-Agent Systems, 2020, 34 (2)
doi: 10.26083/tuprints-00023919
Article, Secondary publication, Publisher's Version
Text
s10458-020-09467-6.pdf Copyright Information: CC BY 4.0 International - Creative Commons, Attribution. Download (4MB) |
Item Type: | Article |
---|---|
Type of entry: | Secondary publication |
Title: | Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement |
Language: | English |
Date: | 17 December 2024 |
Place of Publication: | Darmstadt |
Year of primary publication: | October 2020 |
Place of primary publication: | Dordrecht |
Publisher: | Springer Science |
Journal or Publication Title: | Autonomous Agents and Multi-Agent Systems |
Volume of the journal: | 34 |
Issue Number: | 2 |
Collation: | 44 Seiten |
DOI: | 10.26083/tuprints-00023919 |
Corresponding Links: | |
Origin: | Secondary publication DeepGreen |
Abstract: | Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest when constant communication cannot be assumed. This is common in tasks involving information gathering with multiple independently operating sensor devices that may operate over large physical distances, such as unmanned aerial vehicles, or in communication limited environments such as in the case of autonomous underwater vehicles. In this paper, we frame the information gathering task as a general decentralized partially observable Markov decision process (Dec-POMDP). The Dec-POMDP is a principled model for co-operative decentralized multi-agent decision-making. An optimal solution of a Dec-POMDP is a set of local policies, one for each agent, which maximizes the expected sum of rewards over time. In contrast to most prior work on Dec-POMDPs, we set the reward as a non-linear function of the agents’ state information, for example the negative Shannon entropy. We argue that such reward functions are well-suited for decentralized information gathering problems. We prove that if the reward function is convex, then the finite-horizon value function of the Dec-POMDP is also convex. We propose the first heuristic anytime algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving discrete problems an order of magnitude larger than previous state-of-the-art. We also propose an extension to continuous-state problems with finite action and observation spaces by employing particle filtering. The effectiveness of the proposed algorithms is verified in domains such as decentralized target tracking, scientific survey planning, and signal source localization. |
Uncontrolled Keywords: | Planning under uncertainty, Decentralized POMDP, Information gathering, Active perception |
Identification Number: | Artikel-ID: 42 |
Status: | Publisher's Version |
URN: | urn:nbn:de:tuda-tuprints-239197 |
Classification DDC: | 000 Generalities, computers, information > 004 Computer science |
Divisions: | 20 Department of Computer Science > Intelligent Autonomous Systems |
Date Deposited: | 17 Dec 2024 12:46 |
Last Modified: | 17 Dec 2024 12:46 |
SWORD Depositor: | Deep Green |
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/23919 |
PPN: | |
Export: |
View Item |