Using Mean Embeddings for State Estimation and Reinforcement Learning

Gebhardt, Gregor H.W. (2019)
Using Mean Embeddings for State Estimation and Reinforcement Learning.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

Text
GHW Gebhardt - Using Mean Embeddings for State Estimation and Reinforcement Learning 200219.pdf - Accepted Version
Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs.
Download (10MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Using Mean Embeddings for State Estimation and Reinforcement Learning

Language:

English

Referees:

Peters, Prof. Dr. Jan ; Neumann, Prof. Dr. Gerhard

Date:

2019

Place of Publication:

Darmstadt

Date of oral examination:

17 January 2019

Abstract:

To act in complex, high-dimensional environments, autonomous systems require versatile state estimation techniques and compact state representations. State estimation is crucial when the system only has access to stochastic measurements or partial observations. Furthermore, in combination with models of the system such techniques allow to predict the future which enables the system to asses the outcome of possible decisions. Compact state representations alleviate the curse of dimensionality by distilling the important information from high-dimensional observations. Due to noisy sensory information and non-perfect models of the system, estimates of the state never reflect the true state perfectly but are always subject to errors. The natural choice to incorporate the uncertainty about the state estimate is to use a probability distribution as representation. This results in the so called belief state. High-dimensional observations, for example images, often contain much less information than conveyed by their dimensionality. But also if all the information is necessary to describe the state of the system—for example, think of the state of a swarm with the positions of all agents—a less complex description might be a sufficient representation. In such situations, finding the generative distribution that explains the state would give a much more compact while informative representation. Traditionally, parametric distributions have been used as state representations such as most prevalently the Gaussian distribution. However, in many cases a unimodal distribution might not be sufficient to represent the belief state. Using multi-modal probability distributions, instead, requires more advanced approaches such as mixture models or particle-based Monte Carlo methods. Learning mixture models is however not straight-forward and often results in locally optimal solutions. Similarly, maintaining a good population of particles during inference is a complicated and cumbersome process. A third approach is kernel density estimation which is located at the intersection of mixture models and particle-based approaches. Still, performing inference with any of these approaches requires heuristics that lead to poor performance and a limited scalability to higher dimensional spaces. A recent technique that alleviates this problem are the embeddings of probability distributions into reproducing kernel Hilbert spaces (RKHS). Conditional distributions can be embedded as operators based on which a framework for inference has been presented that allows to apply the sum rule, the product rule and Bayes’ rule entirely in Hilbert space. Using sample based estimators and the kernel-trick of the representer theorem allows to represent the operations as vector-matrix manipulations. The contributions of this thesis are based on or inspired by the embeddings of distributions into reproducing kernel Hilbert spaces. In the first part of this thesis, I propose additions to the framework for nonparametric inference that allow the inference operators to scale more gracefully with the number of samples in the training set. The first contribution is an alternative approach to the conditional embedding operator formulated as a least-squares problem i which allows to use only a subset of the data as representation while using the full data set to learn the conditional operator. I call this operator the subspace conditional embedding operator. Inspired by the least-squares derivations of the Kalman filter, I furthermore propose an alternative operator for Bayesian updates in Hilbert space, the kernel Kalman rule. This alternative approach is numerically more robust than the kernel Bayes rule presented in the framework for non-parametric inference and scales better with the number of samples. Based on the kernel Kalman rule, I derive the kernel Kalman filter and the kernel forward-backward smoother to perform state estimation, prediction and smoothing based on Hilbert space embeddings of the belief state. This representation is able to capture multi-modal distributions and inference resolves--due to the kernel trick--into easy matrix manipulations. In the second part of this thesis, I propose a representation for large sets of homogeneous observations. Specifically, I consider the problem of learning a controller for object assembly and object manipulation with a robotic swarm. I assume a swarm of homogeneous robots that are controlled by a common input signal, e.g., the gradient of a light source or a magnetic field. Learning policies for swarms is a challenging problem since the state space grows with the number of agents and becomes quickly very high dimensional. Furthermore, the exact number of agents and the order of the agents in the observation is not important to solve the task. To approach this issue, I propose the swarm kernel which uses a Hilbert space embedding to represent the swarm. Instead of the exact positions of the agents in the swarm, the embedding estimates the generative distribution behind the swarm configuration. The specific agent positions are regarded as samples of this distribution. Since the swarm kernel compares the embeddings of distributions, it can compare swarm configurations with varying numbers of individuals and is invariant to the permutation of the agents. I present a hierarchical approach for solving the object manipulation task where I assume a high-level object assembly policy as given. To learn the low-level object pushing policy, I use the swarm kernel with an actor-critic policy search method. The policies which I learn in simulation can be directly transferred to a real robotic system. In the last part of this thesis, I investigate how we can employ the idea of kernel mean embeddings to deep reinforcement learning. As in the previous part, I consider a variable number of homogeneous observations—such as robot swarms where the number of agents can change. Another example is the representation of 3D structures as point clouds. The number of points in such clouds can vary strongly and the order of the points in a vectorized representation is arbitrary. The common architectures for neural networks have a fixed structure that requires that the dimensionality of inputs and outputs is known in advance. A variable number of inputs can only be processed by applying tricks. To approach this problem, I propose the deep M-embeddings which are inspired by the kernel mean embeddings. The deep M-embeddings provide a network structure to compute a fixed length representation from a variable number of inputs. Additionally, the deep M-embeddings exploit the homogeneous nature of the inputs to reduce the number of parameters in the network and, thus, make the learning easier. Similar to the swarm kernel, the policies learned with the deep M-embeddings can be transferred to different swarm sizes and different number of objects in the environment without further learning.

Alternative Abstract:

Alternative Abstract

Language

Autonome Systeme benötigen vielseitige Techniken zur Zustandsabschätzung und kompakte Zustandsrepräsentationen, um in komplexen, hochdimensionalen Umgebungen zu agieren. Zustandsabschätzungen sind wichtig, wenn das System nur stochastische Messungen oder partielle Beobachtungen zur Verfügung hat. Mit Modellen des Systems sind durch solche Techniken zudem Vorhersagen in die Zukunft möglich, welche die Beurteilung der Folgen von möglichen Entscheidungen des Systems erlauben. Kompakte Repräsentationen mildern den Fluch der Dimensionalität indem sie die wichtigen Informationen aus hochdimensionalen Beobachtungen herausdestillieren. Aufgrund verrauschter Sensorinformationen und fehlerhafter Systemmodelle sind Zustandsabschätzungen niemals exakt sondern immer fehlerbehaftet. Die naheliegende Wahl, um die Unsicherheit einer Zustandsabschätzung darzustellen, ist die Repräsentation als Wahrscheinlichkeitsverteilung, dem sogenannten belief state. Hochdimensionale Beobachtungen, zum Beispiel Bilder, beinhalten oft viel weniger Informationen als die Dimensionalität vermittelt. Aber auch wenn die gesamte Information notwendig ist, um den Zustand zu beschreiben – zum Beispiel im Falle eines Schwarms, dessen Zustand die Positionen der einzelnen Agenten enthält – könnte eine einfachere Repräsentation ausreichend sein. In solchen Situationen würde eine generative Verteilung, die den Zustand erklärt, eine kompaktere und trotzdem informative Darstellung erlauben. Herkömmlich werden parametrische Verteilungen, vor allem die Gaußverteilung, als Repräsentation verwendet. In vielen Fällen ist solch eine unimodale Verteilung nicht ausreichend, um den belief state darzustellen. Die Verwendung von multimodalen Modellen erfordert fortgeschrittenere Ansätze, wie Mischverteilungen oder partikelbasierte Monte-Carlo-Methoden. Mischverteilungen zu lernen ist jedoch nicht trivial und resultiert oft in lokalen Optima. Genauso ist die Erhaltung einer guten Population während der Inferenz mit Monte-Carlo-Methoden ein komplizierter Prozess. Ein weiterer Ansatz sind Kerndichteschätzer, die als Hybrid von Mischverteilungen und partikelbasierten Ansätzen gesehen werden können. Jeder dieser Ansätze erfordert für die Inferenz Heuristiken, die zu einer schlechten Leistung und begrenzter Skalierbarkeit auf hochdimensionale Räume führen. Eine neuere Technik, die dieses Problem reduziert, sind die Mittelwerteinbettungen von Wahrscheinlichkeitsverteilungen in die Reproducing Kernel Hilbert Spaces (RKHS). Bedingte Verteilungen können als Operatoren in solche Räume eingebettet werden. Basierend darauf wurde ein Regelwerk für nicht-parametrische Inferenz präsentiert, welches erlaubt, die Summenregel, die Produktregel und die Bayes’sche Regel gänzlich im Hilbertraum anzuwenden. Die Verwendung von datenbasierten Schätzfunktionen und der Kernel-Trick des Representer-Theorems ermöglichen es, die Operationen als ‘Vektor-Matrix-Manipulationen’ darzustellen. Die Ergebnisse dieser Arbeit basie- iii ren auf bzw. sind inspiriert von diesen Einbettungen von Verteilungen in Reproducing Kernel Hilbert Spaces. Im ersten Teil dieser Arbeit werden Ergänzungen zum Framework for Nonparametric Inference präsentiert, die die Skalierbarkeit der Inferenz-Operatoren mit der Anzahl an Datenpunkten im Trainingsdatensatz verbessern. Basierend auf der Formulierung eines Optimierungsproblems wird hierzu zunächst der Subspace Conditional Embedding Operator als alternative Formulierung des Conditional Embedding Operators vorgeschlagen. Dieser erlaubt nur eine Teilmenge der Daten als Repräsentation der Verteilung zu verwenden, während der vollständige Datensatz zum Lernen des Operators verwendet wird. Inspiriert von der Herleitung des Kalman-Filters basierend auf der Methode der kleinsten Quadrate, wird die Kernel Kalman Rule außerdem als alternativer Operator für die Bayes’sche Regel im Hilbertraum vorgeschlagen. Dieser alternative Ansatz ist numerisch robuster als die im Framework for Non-parametric Inference enthaltene Kernel Bayes’ Rule und skaliert besser mit der Anzahl der Datenpunkte. Basierend auf der Kernel Kalman Rule werden das Kernel Kalman Filter und der Kernel Forward-Backward Smoother hergeleitet, die es erlauben, den Zustand mittels Einbettungen des belief state in den Hilbertraum abzuschätzen, vorherzusagen oder zu glätten. Diese Darstellung des belief state ist in der Lage, multimodale Verteilungen darzustellen und Inferenz lässt sich – aufgrund des Kernel-Tricks – durch einfache Matrixmanipulationen durchführen. Im zweiten Teil dieser Arbeit wird eine Repräsentation für große Mengen homogener Beobachtungen vorgeschlagen. Im Speziellen wird das Problem der Regelung eines Roboterschwarms zur Manipulation und Zusammenführung von Objekten betrachtet. Es wird ein Schwarm homogener Roboter, der durch ein gemeinsames Signal – z.B. der Gradient einer Lichtquelle oder eines Magnetfelds – kontrolliert werden kann, verwendet. Solche Regelungen für Roboterschwärme zu erlernen ist nicht einfach, da der Zustandsraum mit der Anzahl der Agenten anwächst und schnell sehr hochdimensional wird. Außerdem ist die genaue Anzahl der Agenten im Schwarm sowie die Zuordnung der Agenten zu den Positionen nicht ausschlaggebend, um die Aufgabe zu erledigen. Der Swarm Kernel löst dieses Problem indem der Schwarm durch eine Einbettung in einen Hilbertraum dargestellt wird. Statt der exakten Positionen der Agenten stellt die Einbettung die generative Verteilung hinter der Schwarmkonfiguration dar. Die einzelnen Agentenpositionen können als Stichproben dieser Verteilung betrachtet werden. Da die generativen Verteilungen verglichen werden, ist der Swarm Kernel invariant zur Anzahl der Agenten und bezüglich der Zuordnung der Agenten zu ihren Positionen im Schwarm. Es wird ein hierarchischer Ansatz präsentiert um das Problem der Objektzusammenführung zu lösen. Der übergeordnete Plan zur Zusammenführung wird als gegeben betrachtet, während die lokale Regelung zur Objektmanipulation mittels einer Actor-Critic Policy Search Methode erlernt wird. Die in Simulationen gelernten Regler können direkt auf ein Robotersystem angewendet werden. Im dritten Teil dieser Arbeit wird untersucht, wie die Idee der Einbettung von Verteilungen in Hilberträume auf das tiefe bestärkende Lernen übertragen werden kann. Wie im vorherigen Teil wird eine variable Anzahl von homogenen Beobachtungen angenommen, beispielsweise von einem Roboterschwarm, dessen Größe sich ändern kann. Ein weiteres Beispiel ist die Darstellung von dreidimensionalen Strukturen als Punktwolken. Die Anzahl der Punkte in solchen Wolken kann stark variieren und die Ordnung der Punkte in einer vektorisierten Darstellung ist beliebig. Die gängigen Architekturen für Neuronale Netze verlangen eine starre Struktur, bei der die Dimensionalität der Ein- und Ausgänge von vornherein festgelegt werden muss. Eine variable Anzahl von Beobachtungen lässt sich nur über Kunstgriffe verarbeiten. Um dieses Problem zu lösen, werden die Deep M-Embeddings vorgeschlagen, eine Netzwerkstruktur die von den Einbettungen in Hilberträume inspiriert ist. Die Deep M-Embeddings erlauben die Repräsentation einer Menge von Beobachtungen durch einen Vektor fester Dimensionalität. Zusätzlich ermöglichen sie die Homogenität der Daten auszunutzen, um die Parameteranzahl des Netzwerks zu reduzieren und dadurch ein schnelleres Lernen zu ermöglichen.

German

URN:

urn:nbn:de:tuda-tuprints-84340

Classification DDC:

000 Generalities, computers, information > 004 Computer science

Divisions:

20 Department of Computer Science
20 Department of Computer Science > Computational Learning for Autonomous Systems
20 Department of Computer Science > Intelligent Autonomous Systems

Date Deposited:

22 Feb 2019 10:24

Last Modified:

09 Jul 2020 02:30

URI: