Sousa Ewerton, Marco Antonio (2020)
Bidirectional Human-Robot Learning: Imitation and Skill Improvement.
Technische Universität Darmstadt
doi: 10.25534/tuprints-00011875
Ph.D. Thesis, Primary publication
|
Text
thesis.pdf Copyright Information: CC BY-SA 4.0 International - Creative Commons, Attribution ShareAlike. Download (66MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Bidirectional Human-Robot Learning: Imitation and Skill Improvement | ||||
Language: | English | ||||
Referees: | Peters, Prof. Dr. Jan ; Takahashi, Prof. Dr. Masaki | ||||
Date: | 23 June 2020 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 15 July 2019 | ||||
DOI: | 10.25534/tuprints-00011875 | ||||
Abstract: | A large body of research work has been done to enable robots to learn motor skills from human demonstrations. Likewise, much work has been done on investigating how humans can learn from robots or get physically assisted by them. However, these two bodies of work have been mostly detached from each other. A set of works in robotics has been enabling robots to learn motor skills through imitation and self-improvement while another set of works in robotics and sports science has studied how robots can help humans perform movements with less effort, assist humans in rehabilitation and in motor skill learning. In most of the work on humans being assisted by robots, there is no attempt to improve upon predefined movements or demonstrations through machine learning techniques. In part, this lack of machine learning in the area of robot-assisted human motion is due to the high cost of collecting data of humans interacting with robots. Collecting this type of data might involve risks to humans, usually takes a large amount of time, usually requires expensive equipment and trained personnel. Collecting data of interactions between humans, from videos for example, does not present some of these difficulties. Nevertheless, data of interactions between humans may not be readily usable for machine learning techniques applied to robotics because of the differences between the morphology of humans and robots, lack of information about forces and torques, which may be critical to succeed in some tasks, endless variability of interaction scenarios, etc. Consequently, it is hard to collect enough data to infer the behavior of the robot given the behavior of the human and the environment in any possible situation. It is also hard to make the robot improve its interaction with humans through trial and error in a safe manner within a feasible amount of time. Therefore, even state-of-the-art work on robots assisting humans often avoids machine learning. Robot-assisted training and rehabilitation are based for example on spring-damper systems with fixed stiffness and damping coefficients. Assisted teleoperation relies for example on expert demonstrations or on rules designed to assist users in specific situations such as grasping multiple objects. However, in order to increase the usability and range of applications of robot assistants, it is crucial that these robots autonomously adapt to different humans, tasks and environments. Moreover, it would be desirable that a robot could teach or assist a human just like a human can teach a new skill to a robot via kinesthetic teaching, i.e., moving the robot arm with gravity compensation. Building on data-efficient machine learning techniques, this thesis presents a unifying perspective to the fields of robots learning from humans and humans from robots. Using stochastic movement representations and reinforcement learning techniques, we enable robots to learn motor skills from a few human demonstrations and improve these motor skills. Subsequently, the robot can help humans learn a new motor skill or perform a challenging task such as teleoperation requiring obstacle avoidance and accuracy. We start by tackling one of the key problems in human-robot interaction: how can a robot learn to react to human movements which can vary both in shape and speed? We present a method to enable robots to learn interaction models from demonstrations with different speeds even if the demonstrations are corrupted by noise or occlusion, for example. These interaction models, which are based on stochastic movement representations, can then be used to predict the rest of a human action given observations of its beginning as well as to compute the most suitable robot reaction. Often, human demonstrations are suboptimal because the human is a non-expert, the environment changes rendering the demonstrations ineffective, the robot cannot accurately reproduce the demonstration, etc. In such cases, it is necessary to improve upon the initial demonstrations. We present an approach to deal with this problem by enabling the human to teach new motor skills to robots through demonstrations and incremental feedback. Furthermore, by using probabilistic conditioning, the robot is able to generalize learned movements to different situations. Human demonstrations can be suboptimal not only with respect to their shape in space but also with respect to their speed profile. The robot may have for example to adapt the speed of a movement to throw an object further away or closer, hit an object faster or slower, etc. These can be local adaptations of the speed profile instead of simply accelerating or decelerating the whole movement uniformly. Moreover, if a high degree of accuracy is required, it may be very difficult for a human to correct the initial demonstrations through incremental feedback. Besides that, it is desirable that the robot improve its movements without always requiring human input. We address this problem by optimizing through reinforcement learning movement parameters that determine the speed profile of movements. Having addressed key problems towards making robots more capable of learning from human demonstrations and improve upon these demonstrations, we start to look into how robots can help humans learn new motor skills or perform a challenging teleoperation task. As before, using stochastic movement representations, we address the problem of giving visual feedback to a human trying to learn a new motor skill. Our proposed algorithm aligns expert demonstrations in space and time and builds a probability distribution of trajectories. When a user tries to perform that motor skill, our algorithm uses the built probability distribution based on the expert demonstrations to evaluate if the user attempt matches or not the expert movements and gives visual feedback to the user. A similar principle can be used to give haptic feedback to the user, helping him/her to succeed in challenging teleoperation tasks. As we demonstrate through user studies, teleoperation tasks may be difficult for humans when, for example, the human cannot accurately estimate the 3D position of objects or the human needs to control several degrees of freedom at the same time. In this case, the failed attempts of the human are used as initial demonstrations and are optimized through reinforcement learning to generate trajectory distributions that satisfy the requirements of the task. We assist humans in teleoperation tasks initially by tackling motion planning problems in static environments. Subsequently, we build up a framework that enables robots to solve motion planning problems and interact with humans in dynamic environments. Finally, we tackle tasks with multiple possible solutions such as grasping multiple objects and infer the intention of the user to select the most appropriate guidance. In summary, this thesis uses stochastic movement representations to enable robots to learn motor skills from human demonstrations and incremental feedback. These movement representations are used in conjunction with reinforcement learning algorithms to make the robot improve upon the demonstrations or upon arbitrary initial trajectory distributions. The robot can use trajectory distributions based on expert demonstrations to help non-experts acquire a new motor skill. In addition, if the human demonstrations are not suitable, the robot can use trajectory distributions optimized trough reinforcement learning to help users perform challenging tasks, e.g. in teleoperation. Further applications of the algorithms and frameworks proposed in this thesis may lie in fields such as rehabilitation, training in sports and human-robot collaboration. |
||||
Alternative Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-118752 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 600 Technology, medicine, applied sciences > 600 Technology 600 Technology, medicine, applied sciences > 620 Engineering and machine engineering |
||||
Divisions: | 20 Department of Computer Science > Intelligent Autonomous Systems | ||||
Date Deposited: | 15 Jul 2020 06:16 | ||||
Last Modified: | 19 Jul 2020 07:22 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/11875 | ||||
PPN: | 467619395 | ||||
Export: |
View Item |