Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Nelson, Nigel; Chen, Juo-Tung; Haworth, Jesse; Chen, Xinhao; Zbinden, Lukas; Huang, Dianye

Open-H-Embodiment

A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Open-H Consortium (hover to view full authorship)

Nigel Nelson^7,†, Juo-Tung Chen^9,†, Jesse Haworth^9,†, Xinhao Chen^9,†, Lukas Zbinden^7,†, Dianye Huang^39,†, Alaa Eldin Abdelaal¹, Ayberk Acar³⁷, Farshid Alambeigi², Yunke Ao^3,4,47, Pablo David Aranda Rodriguez⁵, Soofiyan Atar⁶, Mattia Ballo⁸, Noah Barnes⁹, Filip Binkiewicz¹⁰, Peter Black^11,12, Sebastian Bodenstedt^13,45, Leonardo Borgioli¹⁴, Nikola Budjak⁵, Benjamin Calmé³⁸, Fabio Carrillo³, Nicola Cavalcanti³, Changwei Chen⁶, Haoxin Chen¹⁶, Sihang Chen¹⁵, Qihan Chen¹⁸, Zhongyu Chen^17,48, Ziyang Chen¹⁹, Shing Shin Cheng¹⁷, Meiqing Cheng²⁰, Min Cheng^21,15, Zih-Yun Sarah Chiu⁹, Xiangyu Chu^17,48, Camilo Correa-Gallego²², Giulio Dagnino²³, Anton Deguet⁹, Jacob Delgado⁹, Jonathan C. DeLong²⁴, Kaizhong Deng²⁵, Alexander Dimitrakakis²², Qingpeng Ding¹⁷, Hao Ding^9,26, Daniel Donoho²⁷, Anqing Duan²⁸, Marco Esposito⁵, Shane Farritor²⁹, Jad Fayad³⁰, Zahi Fayad²², Mario Ferradosa³¹, Filippo Filicori³², Chelsea Finn^1,33, Philipp Fürnstahl^3,34, Jiawei Ge⁹, Stamatia Giannarou²⁵, Xavier Giralt Ludevid³¹, Frederic Giraud³, Aditya Amit Godbole³⁵, Ken Goldberg¹⁹, Antony Goldenberg⁹, Diego Granero Marana¹⁰, Xiaoqing Guo¹⁶, Tamás Haidegger^36,46, Evan Hailey²⁹, Pascal Hansen⁴⁵, Kush Hari¹⁹, Jonathon Hawkins¹⁰, Shelby Haworth⁹, Ortrun Hellig¹³, S. Duke Herrell³⁷, Zhouyang Hong¹⁷, Andrew Howe¹⁰, Junlei Hu³⁸, Ria Jain¹⁹, Mohammad Rafiee Javazm², Howard Ji¹, Rui Ji⁴⁰, Jianmin Ji¹⁵, Zhongliang Jiang^41,39, Dominic Jones³⁸, Jeffrey Jopling⁹, Britton Jordan³⁷, Ran Ju^41,21, Michael Kam⁹, Luoyao Kang¹⁷, Fausto Kang⁹, Siddhartha Kapuria², Peter Kazanzides⁹, Sonika Kiehler², Ethan Kilmer⁹, Ji Woong (Brian) Kim^9,1, Przemysław Korzeniowski^7,8, Chandra Kuchi³³, Nithesh Kumar³⁷, Alan Kuntz³⁷, Yu Chung Lee¹¹, Hao-Chih Lee²², Hang Li¹⁷, Zhen Li⁴⁰, Xiao Liang⁶, Xinxin Lin²⁰, Jinsong Lin¹⁷, Chang Liu⁹, Fei Liu²⁴, Pei Liu³⁹, Yun-hui Liu¹⁷, Wanli Liuchen¹⁸, Eszter Lukács^36,46, Sareena Mann¹⁹, Miles Mannas^11,12, Brett Marinelli²², Sabina Martyniak⁸, Francesco Marzola²³, Lorenzo Mazza¹³, Xueyan Mei²², Maria Clara Morais³⁵, Chetan Reddy Narayanaswamy¹, Michał Naskręt⁸, David Navarro-Alarcon¹⁸, Sayem Nazmuz¹¹, Cyrus Neary¹¹, Chi Kit Ng¹⁷, Christopher Nguan^11,12, David Noonan³⁰, Ki Hwan Oh¹⁴, Tom Christian Olesch²⁴, Allison M. Okamura¹, Justin Opfermann⁹, Matteo Pescio²³, Doan Xuan Viet Pham⁹, Tito Porras²⁶, Hongliang Ren¹⁷, Ariel Rodriguez Jimenez¹³, Ferdinando Rodriguez y Baena²⁵, Septimiu E. Salcudean¹¹, Asmitha Sathya⁹, Preethi Satish¹⁹, Lalithkumar Seenivasan⁹, Jiaqi Shao¹, Yiqing Shen^9,26, Yu Sheng¹⁵, Lucy XiaoYang Shi^1,33, Zoe Soulé¹³, Stefanie Speidel^13,45, Jianhao Su²⁵, Idris Sunmola⁹, Kristóf Takács³⁶, Yunxi Tang^17,48, Patrick Thornycroft¹⁰, Yu Tian¹⁷, Jordan Thompson³⁷, Mehmet K. Turkcan⁴², Mathias Unberath^9,26, Pietro Valdastri³⁸, Carlos Vives³¹, Quan Vuong³³, Martin Wagner¹³, Farong Wang²⁴, Wei Wang²⁰, Lidian Wang¹⁵, Chung-Pang Wang⁶, Junyi Wang⁴¹, Erqi Wang¹⁷, Ziyi Wang¹⁷, Tanner Watts³⁷, Wolfgang Wein⁵, Yimeng Wu⁹, Zijian Wu¹¹, Hongjun Wu⁹, Luohong Wu³, Jie Ying Wu³⁷, Junlin Wu⁹, Victoria Wu³⁰, Kaixuan Wu¹⁷, Mateusz Wójcikowski⁸, Yunye Xiao⁵, Nan Xiao²⁴, Wenxuan Xie¹⁷, Hao Yang³⁷, Tianqi Yang^17,48, Yinuo Yang⁶, Menglong Ye³⁰, Ryan S. Yeung¹¹, Nural Yilmaz⁹, Chim Ho Yin¹⁷, Michael Yip⁶, Rayan Younis¹³, Chenhao Yu²⁶, Milos Zefran¹⁴, Han Zhang⁹, Yuelin Zhang¹⁷, Yidong Zhang¹⁷, Yanyong Zhang¹⁵, Xuyang Zhang¹⁵, Yameng Zhang^41,48, Joyce Zhang¹⁰, Ning Zhong⁴⁰, Peng Zhou⁴³, Haoying Zhou^9,44, Xiuli Zuo⁴⁰, Nassir Navab^39,‡, Mahdi Azizian^7,‡, Sean D. Huver^7,‡, Axel Krieger^9,26,‡

¹Stanford University, ²The University of Texas at Austin, ³Balgrist University Hospital, ⁴ETH Zurich, ⁵ImFusion GmbH, ⁶University of California San Diego, ⁷NVIDIA, ⁸Sano Centre for Computational Medicine, ⁹Johns Hopkins University, ¹⁰CMR Surgical, ¹¹University of British Columbia, ¹²Vancouver General Hospital, ¹³CeTI/TU Dresden, ¹⁴University of Illinois Chicago, ¹⁵University of Science and Technology of China, ¹⁶Hong Kong Baptist University, ¹⁷The Chinese University of Hong Kong, ¹⁸The Hong Kong Polytechnic University, ¹⁹University of California Berkeley, ²⁰Sun Yat-Sen University, ²¹Tuodao Medical Technology Co., Ltd, ²²Icahn School of Medicine at Mount Sinai, ²³University of Turin, ²⁴University of Tennessee Knoxville, ²⁵Imperial College London, ²⁶Semaphor Surgical, ²⁷Surgical Data Science Collective, ²⁸Mohamed bin Zayed University of Artificial Intelligence, ²⁹Virtual Incision, ³⁰Moon Surgical, ³¹Rob Surgical, ³²Hofstra/Northwell School of Medicine, ³³Physical Intelligence, ³⁴University of Zurich, ³⁵Northwell Health, ³⁶Óbuda University, ³⁷Vanderbilt University, ³⁸University of Leeds, ³⁹Technical University of Munich, ⁴⁰Qilu Hospital of Shandong University, ⁴¹The University of Hong Kong, ⁴²Columbia University, ⁴³Great Bay University, ⁴⁴Worcester Polytechnic Institute, ⁴⁵German Cancer Research Center, ⁴⁶Austrian Center for Medical Innovation and Technology, ⁴⁷ETH AI Center, ⁴⁸Multi-scale Medical Robotics Center

^† Co-first authors. ^‡ Co-senior authors.

🤗 Dataset 🤗 GR00T-H Weights GR00T-H Code 🤗 Cosmos-H-S-S Weights Cosmos-H-S-S Code Data Collection

A snapshot of the Open-H-Embodiment dataset: 770 hours of synchronized video and kinematics across 20 robotic platforms and 48+ institutions worldwide.

Abstract

Autonomous medical robots hold promise in improving patient outcomes by reducing provider fatigue and workload, democratizing access to surgical care, and enabling super-human precision. However, progress in autonomous medical robotics has been limited by a fundamental data problem: existing robot demonstration datasets are small, collected on single platforms, and rarely shared openly, restricting not just policy learning but the broader ecosystem of foundation models, simulation tools, and benchmarks that the field needs to advance.

We introduce Open-H-Embodiment, the first large-scale, multi-institution, multi-robot open dataset for medical robot learning, comprising synchronized video and kinematics collected across more than 48 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures.

We demonstrate the breadth of research enabled by this dataset through two foundation models. We train GR00T-H, the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all baselines) and achieves 65% average success across a 29-step ex vivo suturing sequence on skin-on pork belly. We also train Cosmos-H-Surgical-Simulator, the first kinematic action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in-silico policy evaluation and synthetic data generation for the surgical domain.

Open-H-Embodiment Overview

Figure 1: (A) Geographic distribution of the 48 participating institutions across North America, Europe, the Middle East, and Asia. (B) The 20 healthcare robotic platforms represented in the dataset, spanning surgical systems (da Vinci Si, da Vinci Xi, dVRK, dVRK-Si, MIRA, Versius, BiTrack, Maestro, Torin), general-purpose manipulators adapted for clinical use (Franka Panda, UR5e, Kuka Med 14), and emerging platforms. (C) Representative frames from the dataset illustrating the diversity of tasks, viewpoints, and tissue types covered, including robotic surgery, robotic ultrasound, and related healthcare manipulation tasks. (D) The dataset comprises 770 hours of synchronized multimodal demonstrations spanning language annotations, video observations, and kinematic trajectories. This corpus supports two downstream directions: training GR00T-H, a healthcare-focused vision-language-action model targeting surgical autonomy, and training Cosmos-H-Surgical-Simulator, a multi-embodiment, action-conditioned world model for surgical scene synthesis.

Dataset Composition

Composition of the Open-H-Embodiment dataset showing hours by platform, environment, and task family

Figure 2: (a) Dataset hours by robot platform. (b) Distribution of dataset hours by environment type. (c) Distribution of dataset hours across task families. Together, these panels summarize the current distribution of contributed data across embodiments, collection environments, and task families in Open-H-Embodiment.

Experiment Results

End-to-end suturing task survival cascade

Out-of-distribution generalization on SutureBot

End-to-End Suturing & OOD Generalization: Left: Task survival for GR00T-H on the SutureBot end-to-end suturing task compared to GR00T-N1.6, ACT, and LingBot-VA. GR00T-H improves end-to-end performance with long-horizon success rate of 25%, showing better handling of compounding errors. Right: Out-of-distribution evaluation on SutureBot with an unseen wound configuration under varied lighting (n = 20 per subtask). GR00T-H achieves a 3-task average of 54%, outperforming GR00T-N1.6 (30%) and ACT (5%). Clopper-Pearson 95% confidence intervals are represented as error bars.

Data efficiency ablation at 33% fine-tuning data

Data efficiency ablation at 100% fine-tuning data

Data Efficiency (33% & 100%): Task success rate at 33% and 100% fine-tuning data on SutureBot (n = 10 per subtask). At 33% data, GR00T-H matches ACT while GR00T-N1.6 underperforms both. At 100% data, GR00T-H outperforms all baselines, indicating that Open-H post-training enables both data-efficient learning and stronger scaling. Clopper-Pearson 95% confidence intervals are represented as error bars.

Multi-embodiment performance comparison across dVRK-Si, Versius, and MIRA

Multi-Embodiment Performance Comparison. Evaluation of the GR00T-H foundational VLA (post-trained on Open-H) versus the GR00T-N1.6 base policy across three surgical platforms: the da Vinci Research Kit Si (dVRK-Si), CMR Versius, and Virtual Incision MIRA. GR00T-H demonstrates significant performance gains across all robot embodiments and sub-tasks, with the overall average success rate showing a statistically significant improvement (p < 0.001). Error bars represent the Clopper-Pearson 95% confidence intervals across trials.

Ex Vivo Suturing: GR00T-H ex vivo suturing evaluation across 29 subtasks (n = 10 per subtask, n = 290 total). Tasks span needle manipulation, wound opening, suture passing, knot tying, and suture cutting. GR00T-H achieves an average success rate of ≈65%, with near-perfect performance on structured manipulation primitives and lower success on fine-contact and cutting steps. The rightmost bar shows the overall average across all subtasks. Clopper-Pearson 95% confidence intervals are represented as error bars.

Cosmos-H-Surgical-Simulator quantitative evaluation: L1 and SSIM for benchtop vs. tissue-based datasets

Cosmos-H-S-S Evaluation: Quantitative evaluation of Cosmos-H-Surgical-Simulator. Per-frame L1 and SSIM for benchtop vs. tissue-based datasets. Mean L1 error and SSIM as a function of generated frame index across 72 autoregressively generated frames (6 chunks × 12 frames each). Mean over 3 generation seeds, each averaged across all evaluated episodes within the category; shaded bands indicate 1 standard deviation across seeds. Left: benchtop datasets (phantom and bench procedures). Right: tissue-based datasets (clinical, cadaver, and ex vivo tissue).

Video Demonstrations

End-to-end autonomous suturing demonstration with GR00T-H.

Example rollout from GR00T-H on the CMR Versius peg transfer task.

Example inference run with GR00T-H post-trained for Virtual Incision MIRA needle pickup.

Qualitative results from Cosmos-H-Surgical-Simulator across 30 Open-H datasets, 9 institutions, and 9 embodiments. Each panel shows ground-truth observations (left) alongside model-predicted frames (right), conditioned on recorded kinematic action trajectories.

Autonomous wound closure demonstration with GR00T-H on ex vivo porcine tissue.

BibTeX

@article{openh2026,
  title={Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics},
  author={Nelson, Nigel and Chen, Juo-Tung and Haworth, Jesse and Chen, Xinhao and Zbinden, Lukas and Huang, Dianye and Abdelaal, Alaa Eldin and Acar, Ayberk and Alambeigi, Farshid and Ao, Yunke and Aranda Rodriguez, Pablo David and Atar, Soofiyan and Ballo, Mattia and Barnes, Noah and Binkiewicz, Filip and Black, Peter and Bodenstedt, Sebastian and Borgioli, Leonardo and Budjak, Nikola and Calm{\'e}, Benjamin and Carrillo, Fabio and Cavalcanti, Nicola and Chen, Changwei and Chen, Haoxin and Chen, Sihang and Chen, Qihan and Chen, Zhongyu and Chen, Ziyang and Cheng, Shing Shin and Cheng, Meiqing and Cheng, Min and Chiu, Zih-Yun Sarah and Chu, Xiangyu and Correa-Gallego, Camilo and Dagnino, Giulio and Deguet, Anton and Delgado, Jacob and DeLong, Jonathan C. and Deng, Kaizhong and Dimitrakakis, Alexander and Ding, Qingpeng and Ding, Hao and Donoho, Daniel and Duan, Anqing and Esposito, Marco and Farritor, Shane and Fayad, Jad and Fayad, Zahi and Ferradosa, Mario and Filicori, Filippo and Finn, Chelsea and F{\"u}rnstahl, Philipp and Ge, Jiawei and Giannarou, Stamatia and Giralt Ludevid, Xavier and Giraud, Frederic and Godbole, Aditya Amit and Goldberg, Ken and Goldenberg, Antony and Granero Marana, Diego and Guo, Xiaoqing and Haidegger, Tam{\'a}s and Hailey, Evan and Hansen, Pascal and Hari, Kush and Hawkins, Jonathon and Haworth, Shelby and Hellig, Ortrun and Herrell, S. Duke and Hong, Zhouyang and Howe, Andrew and Hu, Junlei and Jain, Ria and Rafiee Javazm, Mohammad and Ji, Howard and Ji, Rui and Ji, Jianmin and Jiang, Zhongliang and Jones, Dominic and Jopling, Jeffrey and Jordan, Britton and Ju, Ran and Kam, Michael and Kang, Luoyao and Kang, Fausto and Kapuria, Siddhartha and Kazanzides, Peter and Kiehler, Sonika and Kilmer, Ethan and Kim, Ji Woong (Brian) and Korzeniowski, Przemys{\l}aw and Kuchi, Chandra and Kumar, Nithesh and Kuntz, Alan and Lee, Yu Chung and Lee, Hao-Chih and Li, Hang and Li, Zhen and Liang, Xiao and Lin, Xinxin and Lin, Jinsong and Liu, Chang and Liu, Fei and Liu, Pei and Liu, Yun-hui and Liuchen, Wanli and Luk{\'a}cs, Eszter and Mann, Sareena and Mannas, Miles and Marinelli, Brett and Martyniak, Sabina and Marzola, Francesco and Mazza, Lorenzo and Mei, Xueyan and Morais, Maria Clara and Narayanaswamy, Chetan Reddy and Naskr{\k{e}}t, Micha{\l} and Navarro-Alarcon, David and Nazmuz, Sayem and Neary, Cyrus and Ng, Chi Kit and Nguan, Christopher and Noonan, David and Oh, Ki Hwan and Olesch, Tom Christian and Okamura, Allison M. and Opfermann, Justin and Pescio, Matteo and Pham, Doan Xuan Viet and Porras, Tito and Ren, Hongliang and Rodriguez Jimenez, Ariel and Rodriguez y Baena, Ferdinando and Salcudean, Septimiu E. and Sathya, Asmitha and Satish, Preethi and Seenivasan, Lalithkumar and Shao, Jiaqi and Shen, Yiqing and Sheng, Yu and Shi, Lucy XiaoYang and Soul{\'e}, Zoe and Speidel, Stefanie and Su, Jianhao and Sunmola, Idris and Tak{\'a}cs, Krist{\'o}f and Tang, Yunxi and Thornycroft, Patrick and Tian, Yu and Thompson, Jordan and Turkcan, Mehmet K. and Unberath, Mathias and Valdastri, Pietro and Vives, Carlos and Vuong, Quan and Wagner, Martin and Wang, Farong and Wang, Wei and Wang, Lidian and Wang, Chung-Pang and Wang, Junyi and Wang, Erqi and Wang, Ziyi and Watts, Tanner and Wein, Wolfgang and Wu, Yimeng and Wu, Zijian and Wu, Hongjun and Wu, Luohong and Wu, Jie Ying and Wu, Junlin and Wu, Victoria and Wu, Kaixuan and W{\'o}jcikowski, Mateusz and Xiao, Yunye and Xiao, Nan and Xie, Wenxuan and Yang, Hao and Yang, Tianqi and Yang, Yinuo and Ye, Menglong and Yeung, Ryan S. and Yilmaz, Nural and Yin, Chim Ho and Yip, Michael and Younis, Rayan and Yu, Chenhao and Zefran, Milos and Zhang, Han and Zhang, Yuelin and Zhang, Yidong and Zhang, Yanyong and Zhang, Xuyang and Zhang, Yameng and Zhang, Joyce and Zhong, Ning and Zhou, Peng and Zhou, Haoying and Zuo, Xiuli and Navab, Nassir and Azizian, Mahdi and Huver, Sean D. and Krieger, Axel},
  year={2026},
  url={https://open-h.github.io}
}