Manufacturing floors and warehouses are noisy, cluttered, and constantly changing—yet many human workers instinctively remember where they left tools or components overnight. Robots, however, have long struggled to replicate this form of memory, especially when navigating large, dynamic spaces. A breakthrough from MIT is changing that by giving robots the ability to build and access detailed, human-like spatial memories.
A new kind of memory for robots
Traditional AI systems, such as chatbots, rely on text-based memory to recall past conversations. But robots operating in physical environments need a different kind of recall—one that ties objects, locations, and events together over time. The MIT team has developed a spatiotemporal memory framework that allows robots to construct a rich, 3D mental model of their surroundings, complete with descriptions of objects, landmarks, and spatial relationships.
This innovation, called Describe Anything, Anywhere, Anytime, at Any Moment (DAAAM), enables a robot to answer complex queries in plain language. For example, a factory worker could ask a robotic assistant, "Retrieve the partially assembled part we worked on yesterday," and the robot would locate it without prior human labeling or manual mapping.
The system combines two key technologies: advanced computer vision and robotic mapping. While vision models excel at describing scenes, they often process one image at a time, slowing down real-world applications. Robotic mapping creates accurate 3D layouts but typically lacks rich contextual details. DAAAM integrates both, creating a memory system that is both detailed and fast enough for real-time use.
How DAAAM works: speed meets accuracy
One of the biggest challenges in robotic memory is processing speed. Annotating every object in a scene can take seconds per item—far too slow for a robot moving through a warehouse or campus. To overcome this, DAAAM uses an optimization technique that clusters nearby objects and selects key frames—images with clear views of multiple items—for simultaneous annotation.
This approach reduces computation time by up to tenfold. As the robot moves, it attaches batches of object descriptions to specific locations in a 3D map. For instance, it might note that a red bicycle with a flat tire is parked at the bike rack near the Stata Center on MIT’s campus. By grouping objects into regions and annotating only once, the system scales efficiently even in large environments.
Once the memory is built, retrieving information requires intelligent querying. DAAAM uses a large language model (LLM) equipped with specialized tools that perform semantic and location-based searches. This reduces hallucinations and ensures accurate answers in seconds. In testing, DAAAM outperformed existing methods by 21% to 53% across different question types.
Beyond robotics: augmented reality and beyond
While robotics is the primary focus, the DAAAM framework has broader potential. Augmented reality systems for maintenance technicians could use this memory to highlight anomalies or guide repairs by recalling past observations. Commuters navigating unfamiliar cities might receive real-time, context-aware directions based on the robot’s stored spatial knowledge.
"For robots to collaborate effectively with humans, they need to think in space and time like we do," says Luca Carlone, associate professor at MIT and lead researcher on the project. "Our method transforms a traditional map into a language-based map—one that robots can use intuitively and humans can easily query."
The research team includes Nicolas Gorlo, an MIT graduate student, and Lukas Schmid, now a professor at the University of Technology Nuremberg. Their work was presented at the Conference on Computer Vision and Pattern Recognition (CVPR).
Looking ahead: time, action, and human-robot teams
The MIT team is now expanding DAAAM to capture significant events—such as a dropped package or a shifted storage bin—within the spatial memory. They’re also exploring how robots can use this system to predict human actions and improve teamwork in shared environments.
As robots become more integrated into daily life—whether in factories, hospitals, or homes—the demand for intuitive, human-compatible memory systems will grow. DAAAM represents a pivotal step toward machines that don’t just navigate spaces, but truly understand them.
AI summary
MIT araştırmacıları, robotların karmaşık ortamlarda uzun vadeli bellek oluşturmasını sağlayan DAAAM adlı bir sistem geliştirdi. Bu teknoloji, fabrikalarda ve artırılmış gerçeklik uygulamalarında devrim yaratabilir.