June 25, 2024

Robotic model recap, DJI ban, 4D scene reconstruction, and lots of links

Jun 25, 2024

A Quick Look Back at Recent Robotics Progress

We received a lot of positive feedback after last week’s issue (#001), with a common request for more history/ background information to help get people up to speed. Starting this week, we’ll include a little bit of historical context to go alongside our regular “News” section.

To start, I’d recommend reading this great piece in the MIT Technology Review describing the overall leap we’re seeing in robotics: “Is robotics about to have its own ChatGPT moment?”

For today’s issue, we wanted to highlight a few leaps in research over the past couple years (in future issues we’ll recap hardware, industry, and other topics):

Dec, 2022, RT-1: Robotics Transformer for real-world control at scale: an early demonstration showing the same type of models that power ChatGPT (transformers) can be applied to robotic time-series data. This is a data-hungry model, using 130k episodes collected over 17 months.
Apr, 2023, ALOHA & Learning Fine-Grained Bimanual Manipulation
with Low-Cost Hardware: a pioneering project showing that low cost robotic hardware can be teleoperated to collect training data for training an imitation learning model.
Jul, 2023, RT-2: New model translates vision and language into action: the successor to RT-1, incorporating a vision-language-action (VLA) model that draws on web-scale datasets to improve it’s understanding of general language and context.
Oct, 2023, Open X-Embodiment: Robotic Learning Datasets and RT-X Models: a collaboration between institutions to pool robotics data from a large number robots, tasks, and environments (now up to 2.4M episodes). When used to re-train RT-1 and 2, performance improved, showing that diverse data is helpful for improving a single robot’s capabilities.
May, 2024, Octo: An Open-Source Generalist Robot Policy: a VLA model for generalized robotic control showing similar performance to RT-2-X. Unlike RT-2-X, it is open-source and very light-weight, meaning it can be fine-tuned for specific tasks, environments and robots on consumer-grade hardware.

News

[Politics] The US Government Moves Towards Banning China-Made DJI Drones

The House passed new legislation that would ban the use of DJI drones in the United States

What’s up: The House of Representatives passed the National Defense Authorization Act, which includes a ban on DJI drones using FCC frequencies, which would effectively prevent DJI drones from being imported or used in the U.S. This legislation could ground existing drones without compensation to their owners. The bill will now move to the Senate for approval before potentially becoming law. The act faced partisan division in the House, but has bipartisan support in its anti-China nature.

What it means: There are many possible motives for seeking the DJI drone ban, such as to promote US-made drones, or to prevent unauthorized data collection or malicious activity that could be considered a national security risk. These same concerns may grow for imported robots from China. Companies like Unitree are becoming increasingly popular among researches in the US, given their low cost hardware and impressive capabilities.

[Research] An Open Source Robot Vision-Language-Action Model

Wayve, an autonomous driving AI company, announced a new model for generating photorealistic 4D driving scenes within their simulator

What’s up: PRISM-1 allows for the creation of realistic 4D scenes (3D + time) in Wayve’s simulation engine used to train and evaluate their self-driving models. It creates diverse scenarios that occur in every day scenarios on the road.

What it means: The idea of using simulation to evaluate control models is growing in popularity for robotics. Autonomous cars are actually just mobile robots that operate on roads. We think that as simulations become more and more realistic, they will become increasingly powerful for model evaluation, training data generation, and Reinforcement Learning (RL).