Teaching Robots to Learn from Experience with Critics
Sep 21, 2024
· Newsletter

Teaching Robots to Learn from Experience with Critics

#robotics#llm

RAG-Modulo helps robots learn from past mistakes by storing and retrieving experiences. It enhances robot decision-making and improves task performance in complex environments.

leeron

Researchers from Rice University have proposed a new AI framework called RAG-Modulo, aimed at improving how robots solve complex tasks.

Traditionally, robots struggle with uncertainties in their actions and observations, which makes it difficult for them to perform tasks efficiently, especially over long periods. Current methods often rely on large language models (LLMs), which are good at generating plans for robots. However, they lack one crucial capability: learning from their mistakes.

RAG-Modulo tackles this by allowing robots not only to generate actions but also to learn from their past experiences. Imagine a robot trying to navigate a room with obstacles. If it bumps into a chair while trying to reach a table, the next time it encounters a similar setup, it could remember this experience and avoid making the same mistake.

The framework combines an LLM, which generates possible actions for the robot, with a set of critics—mechanisms that evaluate whether the suggested actions are feasible. More importantly, RAG-Modulo incorporates a memory system that stores past interactions. This memory enables the robot to recall what worked or failed in similar past situations, helping it make better decisions over time.

The prompt (left) in RAG-Modulo consists of an environment descriptor, a history of past interactions, and in-context examples to guide the LLM in selecting a feasible action. Here, the agent can be carrying a blue key, which it needs to drop before picking up the green key. The retrieved in-context example shows a similar scenario where the agent is unable to drop an object in an occupied cell. Based on this, the agent generates an action to move to an empty cell before completing the task. (Right) Illustration of how each critic provides feedback for the infeasible action shown on top.
The prompt (left) in RAG-Modulo consists of an environment descriptor, a history of past interactions, and in-context examples to guide the LLM in selecting a feasible action. Here, the agent can be carrying a blue key, which it needs to drop before picking up the green key. The retrieved in-context example shows a similar scenario where the agent is unable to drop an object in an occupied cell. Based on this, the agent generates an action to move to an empty cell before completing the task. (Right) Illustration of how each critic provides feedback for the infeasible action shown on top.

Experiments with simulated environments such as BabyAI (a 2D world where robots perform tasks based on language instructions) and AlfWorld (a more complex household-like setting) showed promising results. The robots using RAG-Modulo performed better and needed fewer interactions to succeed compared to other state-of-the-art systems. This approach significantly reduced the number of mistakes made and improved task completion rates.

The beauty of RAG-Modulo lies in its ability to mimic human learning—learning from experience and improving with each new task. As robots become more involved in everyday activities, this type of self-improving AI could make them far more reliable and efficient.

Jain, A., Jermaine, C., & Unhelkar, V. (2024). RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models. arXiv, 2409.12294. Retrieved from https://arxiv.org/abs/2409.12294v1