MolmoAct 2: Ai2's Open Robot Model Reaches Real-World Labs

Claude
|

What Happened

The Allen Institute for AI (Ai2) has released MolmoAct 2, a fully open robotics foundation model built to do something most AI systems still find surprisingly hard: act competently in the messy physical world. Software models already write our emails and debug our code, yet getting a robot to reliably prep test-tube samples or fold a towel for hours on end has remained out of reach. MolmoAct 2 is Ai2's attempt to close that gap in the open, where anyone can inspect and extend the work.

A Dobot Magician tabletop robotic arm performing a manipulation task in a robotics laboratory
Giacomo Alessandroni / CC BY-SA 4.0 / Wikimedia Commons

MolmoAct 2 belongs to a class Ai2 calls Action Reasoning Models — systems that reason about their surroundings in three dimensions before they move, rather than mapping camera pixels straight to motor commands. The new version is a substantial step up from the original MolmoAct, which debuted last August. It handles a range of real-world tasks out of the box without per-task fine-tuning, and it runs up to 37 times faster than its predecessor. A single action call now takes roughly 180 milliseconds in the base model, against about 6,700 milliseconds in the first MolmoAct — the difference between a machine that pauses visibly between movements and one that responds to its environment in something close to real time.

Alongside the model, Ai2 published the MolmoAct 2-Bimanual YAM dataset: more than 700 hours of two-armed robot demonstrations covering coordinated chores such as folding a towel, scanning groceries, charging a phone, and bussing a table. It is the largest open-source bimanual manipulation dataset released to date, carrying over thirty times the data behind the first model. By late May the team had opened the full training code, every dataset used in training, all evaluation rollouts, and the model's tokenizer, and the released artifacts had already been downloaded more than 400,000 times. The complete write-up sits on the Ai2 research blog.

Why It Matters

The headline benchmark numbers matter less than a single word: open. For all the progress in robotics foundation models over the past year, the recipes behind them have stayed largely closed. Some teams release weights, fewer release training data, and almost none publish enough for outside researchers to reproduce or meaningfully improve the work.

AI research robots in a laboratory, described as key to democratizing and revolutionizing science
U.S. Air Force AFRL, David Dixon / Public domain / Wikimedia Commons

That secrecy quietly slows the entire field. When the underlying data and training procedure are hidden, progress concentrates in the few organizations that can afford to gather their own robot demonstrations, and everyone else is left guessing at why a model succeeds or fails. MolmoAct 2 is a deliberate counterweight, attempting for physical AI roughly what open language models did for text — turning a private capability into shared infrastructure that smaller labs, universities, and startups can build on directly.

The technical foundation reflects that ambition. MolmoAct 2 is initialized not from a generic vision-language model but from Molmo 2-ER, an embodied-reasoning variant trained on an additional three million examples covering pointing, object detection, spatial reasoning, and video-based question answering. Across thirteen embodied-reasoning benchmarks, Molmo 2-ER averages 63.8 out of 100 — ahead of systems including GPT-5 and Gemini 2.5 Pro on those particular spatial tasks. A robot that can be studied and corrected by the whole community, rather than a sealed product, is a different kind of object: less a finished tool than a starting point.

Reaction

The most telling response has been how quickly researchers picked the model up. Within weeks of release, the artifacts crossed 400,000 downloads, and Ai2 integrated MolmoAct 2 into Hugging Face's LeRobot platform, letting teams already in that ecosystem drop the model into existing setups without retooling.

A bank of automated laboratory robots used for high-throughput biomedical research
National Center for Advancing Translational Sciences / Public domain / Wikimedia Commons

Independent evaluation has been encouraging too. Ai2 retained Cortex AI, a robotics evaluation company, to run a third-party benchmark of the model's real-world fine-tuning performance against four competing policies. MolmoAct 2 posted the highest average score, 0.51, ahead of OpenVLA-OFT at 0.36 and Physical Intelligence's π0.5 at 0.32, and it ranked first on seven of eight tasks — among them returning a test tube to a tray, putting tools away, and preparing a pipette tip.

Those lab-bench wins are echoed in raw manipulation tests. On a Franka arm performing zero-shot tasks, MolmoAct 2 averaged 87.1 percent success across the suite, compared with 48.4 percent for Ai2's earlier MolmoBot model and 45.2 percent for π0.5. It reached 100 percent on moving an apple onto a plate and the high eighties to mid nineties on finicker jobs like placing a pipette in a tray or a knife in a box. On LIBERO, a benchmark for acquiring and retaining many skills over time, the model scored 97.2 percent, rising to 98.1 percent in its depth-reasoning variant.

What's Next

The most interesting frontier is not a benchmark but a working laboratory. Since early this year, Ai2 has been piloting MolmoAct 2 with researchers at the Cong Lab at Stanford School of Medicine, led by Professor Le Cong, which is working toward a "self-driving wetlab" that can accelerate genome engineering.

An automated liquid-handling robot of the kind used to prepare samples in a biology wetlab
National Institute of Allergy and Infectious Diseases (NIAID) / Public domain / Wikimedia Commons

It is a demanding stress test. In CRISPR gene-editing workflows, a MolmoAct 2-driven arm handles routine steps such as moving samples between stations and operating benchtop equipment — an environment that is unstructured, requires repeated precision, and punishes small errors that accumulate over a long experiment. After trying several generalist models tuned to their workflow, the Stanford team reported that MolmoAct 2 showed strong potential to streamline parts of wetlab operations and, in turn, speed up discovery.

Ai2 is candid about the limits. The model plans a batch of ten to thirty moves and executes the whole sequence before reasoning again, so it cannot adjust mid-batch if something unexpected happens, and transitions between batches can look jerky. It also works out of the box only on the robot setups it was heavily trained on — the SO-100, the bimanual YAM, and the Franka arm — and adapting it to a humanoid or a hand-equipped robot still requires extra training. These are honest weaknesses, and naming them publicly is part of the point: shared foundations let the field tackle such problems together rather than in private.

Closing Thoughts

Taking action in the physical world remains one of artificial intelligence's hardest frontiers. Language models learned from a near-infinite ocean of text; robots have to learn from demonstrations that someone, somewhere, physically performed — a scarcer and far more expensive resource.

A five-finger servo-electric robotic gripping hand, a symbol of dexterous embodied AI
NearEMPTiness / CC BY-SA 4.0 / Wikimedia Commons

That scarcity is exactly why a 700-hour open dataset and a reproducible training recipe feel consequential beyond any single leaderboard. MolmoAct 2 will not be the model that finally makes home and lab robots dependable; its own authors are quick to list what it cannot yet do. But by putting the data, the weights, and the reasoning approach in the open, it lowers the cost of the next attempt for everyone — and in a field where every hour of robot experience is hard-won, lowering that cost may matter more than any percentage point of accuracy. The quiet wager here is that the path to capable machines in the real world runs through shared knowledge rather than guarded secrets. It is a wager worth watching.

한글 요약

앨런 인공지능 연구소(Ai2)가 현실 세계에서 실제로 일할 수 있는 로봇을 목표로 한 완전 공개형 로보틱스 파운데이션 모델 'MolmoAct 2'를 내놓않습니다. 이 모델은 움직이기 전에 주변 환경을 3차원으로 추론하는 '행동 추론 모띸(ARM)' 계열로, 이전 버전보다 최대 37배 빠르고 작업별 미세조정 없이도 다양한 실제 과제를 처리합니다. 함께 공개된 720시간 분량의 양팔 조작 데이터셋은 동종 공개 데이터셋 중 최대 규모이며, 모델·코드·데이터가 모두 열려 있어 출시 몇 주 만에 40만 회 이상 다운로드됐습니다.

성능 검증도 탄탄합니다. 제3자 평가기관 Cortex AI의 벤치마크에서 MolmoAct 2는 경쟁 정책 네 종을 제치고 평귝 0.51로 1위, 8개 과제 중 7개에서 최고점을 기록했습니다. 프랑카 로봇 팔의 제로샷 실험에서는 평균 87.1%의 성공률로 경쟁 모델(45.2%)을 크게 앞섰고, 스탠퍼드 의대 콩 연구실에서는 CRISPR 유전자 편집 실험의 시료 이동 같은 반복 작업을 맡아 '자율 실험실' 구상의 시험대에 올랐습니다.

물론 한계도 분명합니다. 10~30개 동작을 한 번에 계획해 실행하기 때문에 도중에 돌발 상황에 대응하기 어렵고, 학습된 특정 로봇 구성에서만 곧바로 작동합니다. 그러나 연구진은 이런 약점을 공개적으로 밝히며, 닫힌 제품이 아니라 누구나 검증하고 개선할 수 있는 공동 기반을 지향합니다. 로봇 경험 데이터가 비싸고 귀한 분야에서, 공개라는 선택은 정확도 몇 퍼센트보다 더 큰 의미를 가질 수 있습니다.

참고 / 출컘: Ai2 — MolmoAct 2, Hugging Face LeRobot, Cong Lab, Stanford