Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting

Lawrence Yunliang Chen*¹, Kush Hari*¹, Karthik Dharmarajan*¹, Chenfeng Xu¹, Quan Vuong², Ken Goldberg¹

¹University of California, Berkeley, ²Google DeepMind

RSS 2024

We study zero-shot transfer of vision-based policies across embodiments.

Assume there is a policy trained on a source robot (left). At test time, with an unseen target robot (middle), Mirage performs "cross-painting"---masking out the target robot in the image and inpainting the source robot at the same end effector pose---using robot URDFs and a renderer. By creating an illusion as if the source robot were performing the task (right), Mirage queries the source policy with the cross-painted image to obtain the action.

Abstract

The ability to reuse collected data and transfer trained policies between robots could alleviate the burden of additional data collection and training. While existing approaches such as pretraining plus finetuning and co-training show promise, they do not generalize to robots unseen in training.

Focusing on common robot arms with similar workspaces and 2-jaw grippers, we investigate the feasibility of zero-shot transfer. Through simulation studies on 8 manipulation tasks, we find that state-based Cartesian control policies can successfully zero-shot transfer to a target robot after accounting for forward dynamics.

To address robot visual disparities for vision-based policies, we introduce Mirage, which uses “cross-painting”—-masking out the unseen target robot and inpainting the seen source robot—-during execution in real time so that it appears to the policy as if the trained source robot were performing the task. Mirage applies to both first-person and third-person camera views and policies that take in both states and images as inputs or only images as inputs. Despite its simplicity, our extensive simulation and physical experiments provide strong evidence that Mirage can successfully zero-shot transfer between different robot arms and grippers with only minimal performance degradation on a variety of manipulation tasks such as picking, stacking, and assembly, significantly outperforming a generalist policy.

Video

A Motivating Study

Q: Can the target robot complete a task by querying a state-based policy of the source robot?

Imagine there is a source robot (“oracle”) teaching a target robot to perform a task side by side in a duplicate environment. At each time step, the source robot sees the state of the target environment, puts its objects and end effector to the same poses, and uses its policy to move its end effector to a new pose. The target robot observes the source robot and also moves its end effector there.

We ask: can the target robot successfully complete a task by querying a state-based policy of the source robot in this fashion?

robots image — Simulation Tasks and Robots. We use 5 tasks in Robosuite: Lift, Stack, Can Pick-and-Place, Two Piece Assembly, and Square Peg Insertion, on 5 different target robots. In ORBIT, we select the block lifting task with the UR5 as the target robot. In RLBench, we select 2 tasks: Lifting a lid, and Pushing a button to turn on a lamp, with UR5 and Sawyer as the target robots.

We consider 8 tasks across 3 simulators (Robosuite, ORBIT, and RLBench) with policies trained using imitation learning for Robosuite tasks and RL for ORBIT and RLBench. For all tasks, we train the source state-based policy on the Franka robot and evaluate the success rates on different target robots using the test-time execution strategy mentioned above.

Results

Results below show that most unseen target robots achieve very high task success rates, especially with the same gripper as the source robot. This suggests that the kinematic differences among the robot arms are relatively insignificant. This holds for policies trained using IL and RL, as well as open loop and closed loop.

Mirage Pipeline

We propose Mirage, a simple strategy to zero-shot transfer a trained vision-based policy from the source robot to the target robot. The key idea is “cross-painting”: replacing the target robot with the source robot in the camera observations at test time so that it appears to the policy as if the source robot were performing the task.

We reproject the camera from the target frame to the source frame if there is a non-negligible camera angle change and then apply cross-painting: (1) use the segmentation mask provided by a renderer (e.g., Gazebo) to mask out the target robot, (2) apply fast marching algorithm to fill in the missing pixels, and (3) overlay Gazebo's rendering of the source robot URDF onto the image. The resulting image is fed into the source robot's policy to obtain the action, which is executed after a coordinate frame transform with a blocking or high-gain controller.

Simulation Experiments

We choose Franka as the source robot and UR5e and Kinova Gen3 as the target robots and evaluate Mirage on the 5 tasks in Robosuite. We use the ground-truth forward dynamics. For each task, the source robot policy is trained with behavior cloning on the provided demonstration data using the LSTM architecture with the ResNet-18 visual encoder. The policies utilize 84x84 images, and Mirage operates at approximately 40 Hz to cross-paint the images.

sim experiments Image — **Mirage Results on Transferring Vision-Based Policies in Simulation.** For each task and robot arm combination, the Oracle represents the performance of a vision-based policy assuming access to a ground truth rendering of the source robot given the state of the target robot, the Naive 0-shot method directly passes the visual observation of the target robot to the policy, and Mirage uses cross-painting to generate the visual inputs for the policy. For each method, the first number represents the success rate when the target robot uses the source robot (Franka) gripper and the second number corresponds to using the target robot's default gripper (Robotiq gripper).

We see that in all cases, Mirage significantly outperforms the naive 0-shot performance without any visual gap mitigation. The gap between using an oracle and Mirage is at most 25%. This suggests that cross-painting can effectively bridge the visual differences of the robots.

Videos (Same Gripper)

The videos below show transferring from Franka to UR5 and Kinova3 but with the same Franka gripper. For each video, the left shows the oracle rendering, the middle shows the Mirage cross-painted image, and the right shows the target robot image.