About Google RT-1

The Google RT-1, or Robotics Transformer 1, is a multi-task model designed to address the challenges of real-world robotic control at scale. It tokenizes robot inputs and outputs actions, enabling efficient inference at runtime and making real-time control feasible. The model is trained on a large-scale, real-world robotics dataset, demonstrating improved zero-shot generalization to new tasks, environments, and objects compared to prior techniques.

Here are four key features of the Google RT-1

  1. Large-Scale Training: The RT-1 is trained on a large-scale, real-world robotics dataset of 130k episodes that cover 700+ tasks, collected using a fleet of 13 robots from Everyday Robots (EDR) over 17 months.
  2. Efficient Inference: The RT-1 tokenizes robot inputs and outputs actions, such as camera images, task instructions, and motor commands, enabling efficient inference at runtime and making real-time control feasible.
  3. Improved Generalization: The RT-1 exhibits significantly improved zero-shot generalization to new tasks, environments, and objects compared to prior techniques. It can absorb large amounts of data, including robot trajectories with multiple tasks, objects, and environments, resulting in better performance and generalization.
  4. Open-Source Code: Google has open-sourced the RT-1 code, providing a valuable resource for future research on scaling up robot learning.