Open-H-Embodiment

A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Open-H Consortium (hover to view full authorship)
Nigel Nelson7,†, Juo-Tung Chen9,†, Jesse Haworth9,†, Xinhao Chen9,†, Lukas Zbinden7,†, Dianye Huang39,†, Alaa Eldin Abdelaal1, Ayberk Acar37, Farshid Alambeigi2, Yunke Ao3,4,47, Pablo David Aranda Rodriguez5, Soofiyan Atar6, Mattia Ballo8, Noah Barnes9, Filip Binkiewicz10, Peter Black11,12, Sebastian Bodenstedt13,45, Leonardo Borgioli14, Nikola Budjak5, Benjamin Calmé38, Fabio Carrillo3, Nicola Cavalcanti3, Changwei Chen6, Haoxin Chen16, Sihang Chen15, Qihan Chen18, Zhongyu Chen17,48, Ziyang Chen19, Shing Shin Cheng17, Meiqing Cheng20, Min Cheng21,15, Zih-Yun Sarah Chiu9, Xiangyu Chu17,48, Camilo Correa-Gallego22, Giulio Dagnino23, Anton Deguet9, Jacob Delgado9, Jonathan C. DeLong24, Kaizhong Deng25, Alexander Dimitrakakis22, Qingpeng Ding17, Hao Ding9,26, Daniel Donoho27, Anqing Duan28, Marco Esposito5, Shane Farritor29, Jad Fayad30, Zahi Fayad22, Mario Ferradosa31, Filippo Filicori32, Chelsea Finn1,33, Philipp Fürnstahl3,34, Jiawei Ge9, Stamatia Giannarou25, Xavier Giralt Ludevid31, Frederic Giraud3, Aditya Amit Godbole35, Ken Goldberg19, Antony Goldenberg9, Diego Granero Marana10, Xiaoqing Guo16, Tamás Haidegger36,46, Evan Hailey29, Pascal Hansen45, Kush Hari19, Jonathon Hawkins10, Shelby Haworth9, Ortrun Hellig13, S. Duke Herrell37, Zhouyang Hong17, Andrew Howe10, Junlei Hu38, Ria Jain19, Mohammad Rafiee Javazm2, Howard Ji1, Rui Ji40, Jianmin Ji15, Zhongliang Jiang41,39, Dominic Jones38, Jeffrey Jopling9, Britton Jordan37, Ran Ju41,21, Michael Kam9, Luoyao Kang17, Fausto Kang9, Siddhartha Kapuria2, Peter Kazanzides9, Sonika Kiehler2, Ethan Kilmer9, Ji Woong (Brian) Kim9,1, Przemysław Korzeniowski7,8, Chandra Kuchi33, Nithesh Kumar37, Alan Kuntz37, Yu Chung Lee11, Hao-Chih Lee22, Hang Li17, Zhen Li40, Xiao Liang6, Xinxin Lin20, Jinsong Lin17, Chang Liu9, Fei Liu24, Pei Liu39, Yun-hui Liu17, Wanli Liuchen18, Eszter Lukács36,46, Sareena Mann19, Miles Mannas11,12, Brett Marinelli22, Sabina Martyniak8, Francesco Marzola23, Lorenzo Mazza13, Xueyan Mei22, Maria Clara Morais35, Chetan Reddy Narayanaswamy1, Michał Naskręt8, David Navarro-Alarcon18, Sayem Nazmuz11, Cyrus Neary11, Chi Kit Ng17, Christopher Nguan11,12, David Noonan30, Ki Hwan Oh14, Tom Christian Olesch24, Allison M. Okamura1, Justin Opfermann9, Matteo Pescio23, Doan Xuan Viet Pham9, Tito Porras26, Hongliang Ren17, Ariel Rodriguez Jimenez13, Ferdinando Rodriguez y Baena25, Septimiu E. Salcudean11, Asmitha Sathya9, Preethi Satish19, Lalithkumar Seenivasan9, Jiaqi Shao1, Yiqing Shen9,26, Yu Sheng15, Lucy XiaoYang Shi1,33, Zoe Soulé13, Stefanie Speidel13,45, Jianhao Su25, Idris Sunmola9, Kristóf Takács36, Yunxi Tang17,48, Patrick Thornycroft10, Yu Tian17, Jordan Thompson37, Mehmet K. Turkcan42, Mathias Unberath9,26, Pietro Valdastri38, Carlos Vives31, Quan Vuong33, Martin Wagner13, Farong Wang24, Wei Wang20, Lidian Wang15, Chung-Pang Wang6, Junyi Wang41, Erqi Wang17, Ziyi Wang17, Tanner Watts37, Wolfgang Wein5, Yimeng Wu9, Zijian Wu11, Hongjun Wu9, Luohong Wu3, Jie Ying Wu37, Junlin Wu9, Victoria Wu30, Kaixuan Wu17, Mateusz Wójcikowski8, Yunye Xiao5, Nan Xiao24, Wenxuan Xie17, Hao Yang37, Tianqi Yang17,48, Yinuo Yang6, Menglong Ye30, Ryan S. Yeung11, Nural Yilmaz9, Chim Ho Yin17, Michael Yip6, Rayan Younis13, Chenhao Yu26, Milos Zefran14, Han Zhang9, Yuelin Zhang17, Yidong Zhang17, Yanyong Zhang15, Xuyang Zhang15, Yameng Zhang41,48, Joyce Zhang10, Ning Zhong40, Peng Zhou43, Haoying Zhou9,44, Xiuli Zuo40, Nassir Navab39,‡, Mahdi Azizian7,‡, Sean D. Huver7,‡, Axel Krieger9,26,‡
1Stanford University, 2The University of Texas at Austin, 3Balgrist University Hospital, 4ETH Zurich, 5ImFusion GmbH, 6University of California San Diego, 7NVIDIA, 8Sano Centre for Computational Medicine, 9Johns Hopkins University, 10CMR Surgical, 11University of British Columbia, 12Vancouver General Hospital, 13CeTI/TU Dresden, 14University of Illinois Chicago, 15University of Science and Technology of China, 16Hong Kong Baptist University, 17The Chinese University of Hong Kong, 18The Hong Kong Polytechnic University, 19University of California Berkeley, 20Sun Yat-Sen University, 21Tuodao Medical Technology Co., Ltd, 22Icahn School of Medicine at Mount Sinai, 23University of Turin, 24University of Tennessee Knoxville, 25Imperial College London, 26Semaphor Surgical, 27Surgical Data Science Collective, 28Mohamed bin Zayed University of Artificial Intelligence, 29Virtual Incision, 30Moon Surgical, 31Rob Surgical, 32Hofstra/Northwell School of Medicine, 33Physical Intelligence, 34University of Zurich, 35Northwell Health, 36Óbuda University, 37Vanderbilt University, 38University of Leeds, 39Technical University of Munich, 40Qilu Hospital of Shandong University, 41The University of Hong Kong, 42Columbia University, 43Great Bay University, 44Worcester Polytechnic Institute, 45German Cancer Research Center, 46Austrian Center for Medical Innovation and Technology, 47ETH AI Center, 48Multi-scale Medical Robotics Center
Co-first authors. Co-senior authors.

A snapshot of the Open-H-Embodiment dataset: 770 hours of synchronized video and kinematics across 20 robotic platforms and 48+ institutions worldwide.

Abstract

Autonomous medical robots hold promise in improving patient outcomes by reducing provider fatigue and workload, democratizing access to surgical care, and enabling super-human precision. However, progress in autonomous medical robotics has been limited by a fundamental data problem: existing robot demonstration datasets are small, collected on single platforms, and rarely shared openly, restricting not just policy learning but the broader ecosystem of foundation models, simulation tools, and benchmarks that the field needs to advance.

We introduce Open-H-Embodiment, the first large-scale, multi-institution, multi-robot open dataset for medical robot learning, comprising synchronized video and kinematics collected across more than 48 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures.

We demonstrate the breadth of research enabled by this dataset through two foundation models. We train GR00T-H, the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all baselines) and achieves 65% average success across a 29-step ex vivo suturing sequence on skin-on pork belly. We also train Cosmos-H-Surgical-Simulator, the first kinematic action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in-silico policy evaluation and synthetic data generation for the surgical domain.

Open-H-Embodiment Overview

Open-H-Embodiment overview showing geographic distribution, robotic platforms, representative frames, and data composition

Figure 1: (A) Geographic distribution of the 48 participating institutions across North America, Europe, the Middle East, and Asia. (B) The 20 healthcare robotic platforms represented in the dataset, spanning surgical systems (da Vinci Si, da Vinci Xi, dVRK, dVRK-Si, MIRA, Versius, BiTrack, Maestro, Torin), general-purpose manipulators adapted for clinical use (Franka Panda, UR5e, Kuka Med 14), and emerging platforms. (C) Representative frames from the dataset illustrating the diversity of tasks, viewpoints, and tissue types covered, including robotic surgery, robotic ultrasound, and related healthcare manipulation tasks. (D) The dataset comprises 770 hours of synchronized multimodal demonstrations spanning language annotations, video observations, and kinematic trajectories. This corpus supports two downstream directions: training GR00T-H, a healthcare-focused vision-language-action model targeting surgical autonomy, and training Cosmos-H-Surgical-Simulator, a multi-embodiment, action-conditioned world model for surgical scene synthesis.

Dataset Composition

Composition of the Open-H-Embodiment dataset showing hours by platform, environment, and task family

Figure 2: (a) Dataset hours by robot platform. (b) Distribution of dataset hours by environment type. (c) Distribution of dataset hours across task families. Together, these panels summarize the current distribution of contributed data across embodiments, collection environments, and task families in Open-H-Embodiment.

Experiment Results

Video Demonstrations

BibTeX

@article{openh2026,
  title={Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics},
  author={Nelson, Nigel and Chen, Juo-Tung and Haworth, Jesse and Chen, Xinhao and Zbinden, Lukas and Huang, Dianye and Abdelaal, Alaa Eldin and Acar, Ayberk and Alambeigi, Farshid and Ao, Yunke and Aranda Rodriguez, Pablo David and Atar, Soofiyan and Ballo, Mattia and Barnes, Noah and Binkiewicz, Filip and Black, Peter and Bodenstedt, Sebastian and Borgioli, Leonardo and Budjak, Nikola and Calm{\'e}, Benjamin and Carrillo, Fabio and Cavalcanti, Nicola and Chen, Changwei and Chen, Haoxin and Chen, Sihang and Chen, Qihan and Chen, Zhongyu and Chen, Ziyang and Cheng, Shing Shin and Cheng, Meiqing and Cheng, Min and Chiu, Zih-Yun Sarah and Chu, Xiangyu and Correa-Gallego, Camilo and Dagnino, Giulio and Deguet, Anton and Delgado, Jacob and DeLong, Jonathan C. and Deng, Kaizhong and Dimitrakakis, Alexander and Ding, Qingpeng and Ding, Hao and Donoho, Daniel and Duan, Anqing and Esposito, Marco and Farritor, Shane and Fayad, Jad and Fayad, Zahi and Ferradosa, Mario and Filicori, Filippo and Finn, Chelsea and F{\"u}rnstahl, Philipp and Ge, Jiawei and Giannarou, Stamatia and Giralt Ludevid, Xavier and Giraud, Frederic and Godbole, Aditya Amit and Goldberg, Ken and Goldenberg, Antony and Granero Marana, Diego and Guo, Xiaoqing and Haidegger, Tam{\'a}s and Hailey, Evan and Hansen, Pascal and Hari, Kush and Hawkins, Jonathon and Haworth, Shelby and Hellig, Ortrun and Herrell, S. Duke and Hong, Zhouyang and Howe, Andrew and Hu, Junlei and Jain, Ria and Rafiee Javazm, Mohammad and Ji, Howard and Ji, Rui and Ji, Jianmin and Jiang, Zhongliang and Jones, Dominic and Jopling, Jeffrey and Jordan, Britton and Ju, Ran and Kam, Michael and Kang, Luoyao and Kang, Fausto and Kapuria, Siddhartha and Kazanzides, Peter and Kiehler, Sonika and Kilmer, Ethan and Kim, Ji Woong (Brian) and Korzeniowski, Przemys{\l}aw and Kuchi, Chandra and Kumar, Nithesh and Kuntz, Alan and Lee, Yu Chung and Lee, Hao-Chih and Li, Hang and Li, Zhen and Liang, Xiao and Lin, Xinxin and Lin, Jinsong and Liu, Chang and Liu, Fei and Liu, Pei and Liu, Yun-hui and Liuchen, Wanli and Luk{\'a}cs, Eszter and Mann, Sareena and Mannas, Miles and Marinelli, Brett and Martyniak, Sabina and Marzola, Francesco and Mazza, Lorenzo and Mei, Xueyan and Morais, Maria Clara and Narayanaswamy, Chetan Reddy and Naskr{\k{e}}t, Micha{\l} and Navarro-Alarcon, David and Nazmuz, Sayem and Neary, Cyrus and Ng, Chi Kit and Nguan, Christopher and Noonan, David and Oh, Ki Hwan and Olesch, Tom Christian and Okamura, Allison M. and Opfermann, Justin and Pescio, Matteo and Pham, Doan Xuan Viet and Porras, Tito and Ren, Hongliang and Rodriguez Jimenez, Ariel and Rodriguez y Baena, Ferdinando and Salcudean, Septimiu E. and Sathya, Asmitha and Satish, Preethi and Seenivasan, Lalithkumar and Shao, Jiaqi and Shen, Yiqing and Sheng, Yu and Shi, Lucy XiaoYang and Soul{\'e}, Zoe and Speidel, Stefanie and Su, Jianhao and Sunmola, Idris and Tak{\'a}cs, Krist{\'o}f and Tang, Yunxi and Thornycroft, Patrick and Tian, Yu and Thompson, Jordan and Turkcan, Mehmet K. and Unberath, Mathias and Valdastri, Pietro and Vives, Carlos and Vuong, Quan and Wagner, Martin and Wang, Farong and Wang, Wei and Wang, Lidian and Wang, Chung-Pang and Wang, Junyi and Wang, Erqi and Wang, Ziyi and Watts, Tanner and Wein, Wolfgang and Wu, Yimeng and Wu, Zijian and Wu, Hongjun and Wu, Luohong and Wu, Jie Ying and Wu, Junlin and Wu, Victoria and Wu, Kaixuan and W{\'o}jcikowski, Mateusz and Xiao, Yunye and Xiao, Nan and Xie, Wenxuan and Yang, Hao and Yang, Tianqi and Yang, Yinuo and Ye, Menglong and Yeung, Ryan S. and Yilmaz, Nural and Yin, Chim Ho and Yip, Michael and Younis, Rayan and Yu, Chenhao and Zefran, Milos and Zhang, Han and Zhang, Yuelin and Zhang, Yidong and Zhang, Yanyong and Zhang, Xuyang and Zhang, Yameng and Zhang, Joyce and Zhong, Ning and Zhou, Peng and Zhou, Haoying and Zuo, Xiuli and Navab, Nassir and Azizian, Mahdi and Huver, Sean D. and Krieger, Axel},
  year={2026},
  url={https://open-h.github.io}
}