How robots learn to interpret vague instructions with AI assistance

A robot that can’t understand vague instructions isn’t much help in a warehouse, office, or home. What if you asked a machine to place coffee on your desk without disturbing your Zoom call? It needs to know how to do the task—and what to avoid. Traditional training methods rely on either exhaustive physical demonstrations or detailed written instructions, both of which demand significant human effort.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a system called Masked Inverse Reinforcement Learning (Masked IRL) to automate this process. The approach leverages large language models (LLMs) to clarify ambiguous prompts and prioritize key details, reducing the need for manual demonstrations by nearly five times. Instead of requiring users to explain every nuance, the system interprets gestures, refines instructions, and generates efficient motion plans for robots working in dynamic environments.

A smarter way to teach robots with less effort

Training robots typically involves two labor-intensive methods: recording extensive physical demos or writing detailed step-by-step instructions. Neither option scales well in real-world settings, where users may lack the time—or patience—to provide exhaustive guidance.

Masked IRL addresses this by combining sensor data from kinesthetic demonstrations (where a human physically guides a robot through movements) with AI-driven interpretation. The system captures the robot’s trajectory—every joint movement and path taken—then uses an LLM to compare it against the most efficient route. For example, a vague instruction like "stay close" might be refined to "maintain a 10-centimeter distance from the table surface."

This interpretation isn’t just about precision—it’s about understanding user intent. As MIT PhD student Minyoung Hwang, lead author of the project’s paper, explains: "Our approach minimizes human effort by letting machines infer what users truly want, even when prompts are incomplete."

Filtering out the noise to focus on what matters

Robots operate in cluttered spaces filled with distractions—laptops, shelves, or even a user’s posture during a demo. Masked IRL’s second LLM module evaluates these environmental factors, masking irrelevant details (assigning them a "0") while highlighting critical ones (marked "1").

For instance:

A user leaning on a table during a demo? Masked as irrelevant.
A laptop blocking the robot’s path? Prioritized as critical.
The shape of a target object? Assessed for importance.

This filtering process ensures the robot’s motion plan aligns with human preferences—even when those preferences aren’t explicitly stated. In tests, Masked IRL correctly identified unstated user priorities 15% more often than comparable baselines, whether in simulation or real-world trials.

The system’s efficiency shines in its training requirements. Robots equipped with Masked IRL learned tasks with fewer than 50 kinesthetic demonstrations, outperforming models that relied solely on vague prompts. In one experiment, a robotic arm moved a coffee mug around a laptop on a table—avoiding collisions without prior knowledge of the obstacle. In another, it wiped a table while maintaining a safe distance from the user’s computer.

Beyond today’s robots: A clearer path to dynamic interaction

Current iterations of Masked IRL focus on refining prompts and filtering environmental noise, but future enhancements could make robots even more perceptive. CSAIL researchers are exploring camera integration, which would allow machines to visually scan their surroundings and identify task-relevant objects—like ignoring bananas when picking up a toy.

This innovation could bridge the gap between robotic precision and human-like adaptability. As the team prepares to present their work at the 2026 IEEE International Conference on Robotics and Automation, their goal remains clear: to reduce the gap between human instruction and robotic execution, making automation more intuitive and accessible in homes, factories, and beyond.

For now, Masked IRL proves that robots don’t need every detail spelled out—they just need the right AI to fill in the blanks.

AI summary

Yapay zekâ destekli yeni bir sistem, robotların belirsiz komutları çözümlemelerine ve saklanması gereken detaylara odaklanmalarına yardımcı oluyor. MIT araştırmacıları tarafından geliştirilen yöntem, robotların görevleri daha az veri ile öğrenmesini ve kullanıcıların niyetini otomatik olarak anlamasını sağlıyor.

How robots learn to interpret vague instructions with AI assistance

A smarter way to teach robots with less effort

Filtering out the noise to focus on what matters

Beyond today’s robots: A clearer path to dynamic interaction

Comments

MIT Names Labor Economist David Autor as Economics Department Head

Why America’s scientific curiosity fuels global progress

AI agent workflows optimized for speed, cost, and energy efficiency