Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting

1University of California, Berkeley, 2Google DeepMind
Teaser Image

We study zero-shot transfer of vision-based policies across embodiments.

Assume there is a policy trained on a source robot (left). At test time, with an unseen target robot (middle), Mirage performs "cross-painting"---masking out the target robot in the image and inpainting the source robot at the same end effector pose---using robot URDFs and a renderer. By creating an illusion as if the source robot were performing the task (right), Mirage queries the source policy with the cross-painted image to obtain the action.

Abstract

The ability to reuse collected data and transfer trained policies between robots could alleviate the burden of additional data collection and training. While existing approaches such as pretraining plus finetuning and co-training show promise, they do not generalize to robots unseen in training.

Focusing on common robot arms with similar workspaces and 2-jaw grippers, we investigate the feasibility of zero-shot transfer. Through simulation studies on 8 manipulation tasks, we find that state-based Cartesian control policies can successfully zero-shot transfer to a target robot after accounting for forward dynamics.

To address robot visual disparities for vision-based policies, we introduce Mirage, which uses “cross-painting”—-masking out the unseen target robot and inpainting the seen source robot—-during execution in real time so that it appears to the policy as if the trained source robot were performing the task. Despite its simplicity, our extensive simulation and physical experiments provide strong evidence that Mirage can successfully zero-shot transfer between different robot arms and grippers with only minimal performance degradation on a variety of manipulation tasks such as picking, stacking, and assembly, significantly outperforming a generalist policy.

Video

A Motivating Study

Q: Can the target robot complete a task by querying a state-based policy of the source robot?

Imagine there is a source robot (“oracle”) teaching a target robot to perform a task side by side in a duplicate environment. At each time step, the source robot sees the state of the target environment, puts its objects and end effector to the same poses, and uses its policy to move its end effector to a new pose. The target robot observes the source robot and also moves its end effector there.

We ask: can the target robot successfully complete a task by querying a state-based policy of the source robot in this fashion?

robots image
Simulation Tasks and Robots. We use 5 tasks in Robosuite: Lift, Stack, Can Pick-and-Place, Two Piece Assembly, and Square Peg Insertion, on 5 different target robots. In ORBIT, we select the block lifting task with the UR5 as the target robot. In RLBench, we select 2 tasks: Lifting a lid, and Pushing a button to turn on a lamp, with UR5 and Sawyer as the target robots.

We consider 8 tasks across 3 simulators (Robosuite, ORBIT, and RLBench) with policies trained using imitation learning for Robosuite tasks and RL for ORBIT and RLBench. For all tasks, we train the source state-based policy on the Franka robot and evaluate the success rates on different target robots using the test-time execution strategy mentioned above.

Results

Results below show that most unseen target robots achieve very high task success rates, especially with the same gripper as the source robot. This suggests that the kinematic differences among the robot arms are relatively insignificant. This holds for policies trained using IL and RL, as well as open loop and closed loop.

control study results
State-Based Policy Transfer Experiment Results. Results suggest that most unseen target robots can successfully perform the tasks using the source robot as its guide for where to move its gripper. Jaco has a 3-jaw gripper, which explains its lower success rates.

Mirage Pipeline

We propose Mirage, a simple strategy to zero-shot transfer a trained vision-based policy from the source robot to the target robot. The key idea is “cross-painting”: replacing the target robot with the source robot in the camera observations at test time so that it appears to the policy as if the source robot were performing the task.

robots Image
Illustration of Mirage’s pipeline.

We reproject the camera from the target frame to the source frame if there is a non-negligible camera angle change and then apply cross-painting: (1) use the segmentation mask provided by a renderer (e.g., Gazebo) to mask out the target robot, (2) apply fast marching algorithm to fill in the missing pixels, and (3) overlay Gazebo's rendering of the source robot URDF onto the image. The resulting image is fed into the source robot's policy to obtain the action, which is executed after a coordinate frame transform with a blocking or high-gain controller.


Simulation Experiments

We choose Franka as the source robot and UR5e and Kinova Gen3 as the target robots and evaluate Mirage on the 5 tasks in Robosuite. We use the ground-truth forward dynamics. For each task, the source robot policy is trained with behavior cloning on the provided demonstration data using the LSTM architecture with the ResNet-18 visual encoder. The policies utilize 84x84 images, and Mirage operates at approximately 40 Hz to cross-paint the images.

sim experiments Image
Mirage Results on Transferring Vision-Based Policies in Simulation. For each task and robot arm combination, the Oracle represents the performance of a vision-based policy assuming access to a ground truth rendering of the source robot given the state of the target robot, the Naive 0-shot method directly passes the visual observation of the target robot to the policy, and Mirage uses cross-painting to generate the visual inputs for the policy. For each method, the first number represents the success rate when the target robot uses the source robot (Franka) gripper and the second number corresponds to using the target robot's default gripper (Robotiq gripper).

We see that in all cases, Mirage significantly outperforms the naive 0-shot performance without any visual gap mitigation. The gap between using an oracle and Mirage is at most 25%. This suggests that cross-painting can effectively bridge the visual differences of the robots.

Videos (Same Gripper)

The videos below show transferring from Franka to UR5 and Kinova3 but with the same Franka gripper. For each video, the left shows the oracle rendering, the middle shows the Mirage cross-painted image, and the right shows the target robot image.

Lift

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Stack

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Can

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

*Still frames in the Mirage images when the Gazebo IK solver fails to find a solution.

Two Piece Assembly

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Square

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Videos (Different Grippers)

The videos below show transferring from Franka to UR5 and Kinova3 with their default (Robotiq) grippers.

Lift

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Stack

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Can

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

*Still frames in the Mirage images when the Gazebo IK solver fails to find a solution.

Two Piece Assembly

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target

Square

Oracle Source  Mirage (Cross-Painted)  Actual Target

Oracle Source  Mirage (Cross-Painted)  Actual Target


Physical Experiments

In physical experiments, we evaluate Mirage across 3 different embodiments: Franka with Franka and Robotiq 2F-85 grippers, and UR5 with Robotiq 2F-85 gripper.

We evaluate on 4 manipulation tasks: (1) Pick up a stuffed animal (tiger) and put it into a bowl, (2) open a toy drawer, (3) stack one cup into another, and (4) put a pepper into a toaster and close its glass door. For each task, we collected 200-400 demonstrations through teleop and train a Diffusion Policy, which we use as our source policy.

real experiments Image
Mirage Results on Transferring Vision-Based Policies in Real. We evaluate Mirage on 4 tasks in 2 settings: Different Gripper: Transferring policies between the Franka gripper and the Robotiq 2F-85 gripper on a Franka robot. Different Robot: Using the Franka with either gripper as the source robot, and the UR5 robot with the Robotiq gripper as the target robot. Baseline/Baseline 0-shot: Separate Diffusion Policy models trained on the source robot data for each task and evaluated on the source robot or zero-shot on the target embodiments. Octo: Octo Base model finetuned on the source robot data from all tasks together and evaluated on the source robot or zero-shot on the target embodiments. Mirage: Evaluation of zero-shot transfer to the target embodiments using Mirage with the source policy being the corresponding baseline Diffusion Policy models.

From the results, we can see that, for both gripper transfer and robot (and gripper) transfer, Mirage achieves strong zero-shot performance, significantly outperforming both baselines. In particular, when evaluating on the UR5, neither baselines are able to achieve any success on the 3 more difficult tasks, while Mirage has at most a 30% gap from the source robot performance.


Videos

The videos below show transferring from Franka to UR5 and Kinova3 but with the same Franka gripper. For each video, the left shows the oracle rendering, the middle shows the Mirage cross-painted image, and the right shows the target robot image.

Tiger Pick-and-Place

Mirage (Cross-Painted)     Actual Target Robot

Transfer Grippers

Mirage (Cross-Painted)    Actual Target Robot

Transfer Robots

Open Drawer

Mirage (Cross-Painted)    Actual Target Robot

Transfer Grippers

Mirage (Cross-Painted)    Actual Target Robot

Transfer Grippers & Robots

Stack Cup

Mirage (Cross-Painted)    Actual Target Robot

Transfer Grippers

Mirage (Cross-Painted)    Actual Target Robot

Transfer Grippers & Robots

Toaster

Mirage (Cross-Painted)    Actual Target Robot

Transfer Grippers

Mirage (Cross-Painted)    Actual Target Robot

Transfer Robots

mirage icon
mirage icon

BibTeX

@article{chen2024mirage,
      title={Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting},
      author={Lawrence Yunliang Chen and Kush Hari and Karthik Dharmarajan and Chenfeng Xu and Quan Vuong and Ken Goldberg},
      year={2024},
      eprint={2402.19249},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}