Reinforcement learning provides a conceptual framework for autonomous agents to learn from experience, analogous to how a pet might be trained with treats. But practical applications of reinforcement learning are often far from natural: instead of using RL to learn through trial and error by actually attempting the desired task, typical RL applications use a separate training phase (usually simulated ). For example, AlphaGo did not learn to play Go by competing against thousands of humans, but by playing against itself in simulation. While this type of simulated training is attractive for games where the rules are well known, applying it to real-world domains such as robotics may require a number of complex approaches, such as using simulated data or the instrumentation of real-world environments in various environments. ways to make training in laboratory conditions feasible. Can we design reinforcement learning systems for robots that allow them to learn directly “on the job” while performing the task at hand? In this blog post, we’ll talk about ReLMM, a system we developed that learns to clean a room directly with a real robot through continuous learning.
We evaluate our method on different difficulty tasks. The upper left task has uniform white spots for unobstructed collecting, while other rooms have objects of various shapes and colors, obstacles that increase the difficulty of navigation and hide objects, and patterned carpets that make it difficult to see objects against the floor .
To allow for “on the job” training in the real world, the difficulty of gathering more experience is prohibitive. If we can make real-world training easier, making the data collection process more autonomous without the need for human monitoring or intervention, we can further benefit from the simplicity of agents learning from experience. In this work, we design an “on-the-job” mobile robot training system for cleaning by learning to grasp objects in different rooms.
People aren’t born one day and have job interviews the next. There are many levels of tasks that people learn before applying for a job, as we start with the easiest and build up. In ReLMM, we make use of this concept by allowing robots to train common, reusable skills, such as grasping, by first encouraging the robot to prioritize training those skills before learning subsequent skills, such as navigation. Learning this way has two advantages for robotics. The first advantage is that when an agent focuses on learning a skill, it is more efficient at gathering data around the local state distribution of that skill.
This is shown in the figure above, where we assessed the amount of priority capture experience required to obtain efficient mobile manipulation training. The second advantage of a multi-level learning approach is that we can inspect the trained models for different tasks and ask them questions such as “can you understand anything right now”, which is useful for navigation training that we will describe below.
Training this multi-level policy was not only more efficient than learning both skills at once, but it allowed the grasp controller to inform the navigation policy. Having a model that estimates the uncertainty in its understanding success (Our above) can be used to improve navigational exploration by skipping areas without grabbable objects, unlike Bonus without uncertainty that does not use this information. The model can also be used to relabel the data during training, so that in the unfortunate case that the grasp model failed to grasp an object within its reach, the grasp policy can still provide some signal indicating that there was an object but the grasp. politics has not yet learned to understand it. In addition, learning modular models has engineering advantages. Modular training enables reuse of skills that are easier to learn and can enable intelligent systems to be built one piece at a time. This is beneficial for many reasons, including security assessment and understanding.
Many robotics tasks we see today can be solved with varying degrees of success using hand-designed controllers. For our room cleaning task, we designed a hand-crafted controller that locates objects by clustering images and rotates to the nearest detected object at each step. This expertly designed driver works very well on visually striking ball socks and makes reasonable paths around obstacles. but can’t learn an optimal path to collect items quickly, and struggles with visually diverse rooms. As shown in video 3 below, the scripted politician is distracted by the white patterned carpet as he tries to locate more white objects to grab.
We show a comparison between (1) our policy at the start of training (2) our policy at the end of training (3) the scripted policy. In (4), we can see that the performance of the robot improves over time and eventually outperforms the script policy to quickly pick up the objects in the room.
Since we can use experts to code this hand-crafted controller, what’s the point of learning? An important limitation of hand-designed controllers is that they are tuned for a particular task, for example, grasping white objects. When various objects, differing in color and shape, are introduced, the original tuning may no longer be optimal. Rather than requiring more manual engineering, our learning-based method is able to adapt to various tasks by gathering its own experience.
However, the most important lesson is that even if the hand-designed controller is capable, the learning agent eventually outgrows it given enough time. This learning process is itself autonomous and takes place while the robot is doing its job, so it is relatively inexpensive. This shows the ability of learning agents, which can also be thought of as devising a general way to carry out an “expert hand-tuning” process for any type of task. Learning systems have the ability to create the entire robot control algorithm and are not limited to adjusting some parameters in a script. The key step in this work is allowing these real-world learning systems to autonomously collect the data needed to enable successful learning methods.
This post is based on the paper “Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation”, presented at CoRL 2021. More details can be found in our paper, website and video. We provide code to reproduce our experiments. We thank Sergey Levine for his valuable comments on this blog post.
At Ikaroa, a full stack tech company, we are excited to share the latest research on Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation – The Berkeley Artificial Intelligence Research Blog. This ground-breaking research, recently published by University of California, Berkeley, outlines new breakthroughs in AI and robotics.
Reinforcement learning is a form of machine learning that leverages algorithms to train self-learning robots or AIs to maximize rewards by trial and error. This type of learning is broadly used in robotics from controlling walking mechanisms to autonomy-enabling vehicles and robots. The Berkeley Artificial Intelligence blog has used this same concept to develop a novel type of reinforcement learning dubbed Mobile Manipulation.
This type of learning has enabled robots to take on more complex tasks with greater precision and accuracy than ever before. The blog authors explain tasks such as moving a block from one place to another, loading a tray on a hook, and flipping a toy helicopter over to explore how mobile manipulation can be leveraged for efficient and real-world problem solving.
The blog post also focuses on the following areas related to mobile manipulation:
• Learning target final behaviors without relying heavily on human intervention
• Generating policies that focus on cost-efficient learning while avoiding environmental changes with environment identification
• Improving generalization capabilities across radically different learning domains
• Self-teaching computational systems to understand an environment’s dynamics
The blog concludes that mobile manipulation represents the future of AI and robotics, and can be utilized to not only build highly sophisticated robot systems but also to develop real-world, autonomous robots that are capable of solving complex tasks without needing continuous monitoring.
At Ikaroa we are enthusiastic about the progress made in mobile manipulation and more broadly, AI and robotics. We agree with the conclusions of the blog post that mobile manipulation can be harnessed to develop robots with fully autonomous behavior such as navigating around obstacles and responding to external stimuli. We are inspired by this new research and are planning to start working on projects in AI and robotics in the near future. Additionally, we are always looking for talented people to join our team, so if you’re interested in this research, don’t hesitate to contact us.