RT-2: A New Model that Translates Vision and Language into Action
RT-2 is a vision-language-action (VLA) model that is trained on a combination of web and robotics data. It has been shown to be effective in a variety of tasks, such as following instructions to move objects, sorting objects by color, and picking up and placing objects. RT-2 is a promising new model for VLA tasks, and it has the potential to be used in a wide range of applications.