China Team Open-Sources Embodied-R1.5, Claims SOTA on Many Embodied‑AI Benchmarks
A China-based research team has posted a new robotics-focused AI model on arXiv and says it outperforms major proprietary systems on many embodied-AI benchmarks while also releasing its weights, datasets and code publicly.
The paper, “Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models,” was posted June 9 as arXiv:2606.11324. Its first author is Yifu Yuan, who identifies himself publicly as a Ph.D. student at Tianjin University’s Deep Reinforcement Learning Lab. The arXiv page lists a large multi-author team.
The claim is notable because embodied AI — systems built to reason about the physical world, plan actions and help control robots — is an area where high-profile benchmark results often come from closed models. A public release, if it holds up, could lower barriers for other researchers working on robot reasoning and manipulation.
In the paper abstract, the authors describe Embodied-R1.5 as a unified “Embodied Foundation Model” designed to combine embodied cognition, task planning, correction and pointing in one architecture. They say they used three automated data-construction pipelines to build a training system of more than 15 billion tokens, and introduced what they call a Planner-Grounder-Corrector, or PGC, closed-loop framework so a single model can execute and self-correct during long-horizon tasks.
The headline performance claim comes directly from the abstract and has not been independently verified. “With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4,” the authors wrote. VLM refers to vision-language models, which process images and text together.
The paper also says the model can be fine-tuned into a vision-language-action system — meaning a model that not only interprets images and language but also drives actions — with relatively little data. According to the authors, that version outperforms leading VLA models including π0.5 across four manipulation benchmark suites. The paper further reports zero-shot real-robot experiments in instruction following, affordance grounding, articulated object manipulation and long-horizon tasks. Those results, too, are author-reported and were not independently replicated in the materials reviewed for this story.
The open-release component is a central part of the announcement. In the abstract, the authors wrote: “We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.” There is public evidence of related release artifacts on Hugging Face under the IffYuan account, including Embodied-R1.5 model pages and an Embodied-R1.5-SFT-Dataset page.
That combination of benchmark claims and downloadable artifacts makes the release stand out, especially in a category where comparisons are often made against systems that outside researchers cannot inspect fully. The paper explicitly compares Embodied-R1.5 against Google DeepMind’s Gemini Robotics-ER 1.5 and OpenAI’s GPT-5.4.
There are, however, important caveats. No independent third-party replication of the headline benchmark claims was identified. And while the paper describes the model as having 8 billion parameters, at least one public Hugging Face page for Embodied-R1.5-8B-SFT lists 9 billion parameters in its metadata. That discrepancy should be clarified, though it does not by itself invalidate the broader release.
The project is also not appearing out of nowhere. The same group has previously released Embodied-R1 with public code and project materials, making Embodied-R1.5 a follow-on effort rather than a one-off paper. For now, the significance of the new release lies less in any single leaderboard result than in the authors’ attempt to pair ambitious embodied-AI claims with an unusually open package of weights, data and evaluation tools.