Techno Blender
Digitally Yours.

A bot that watched 70,000 hours of Minecraft videos could unlock AI’s next big thing

0 9


The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained how to perform tasks by watching humans do them. Imitation learning can be used to train AI to control robot arms, drive cars or navigate webpages.  

There is a vast amount of video online showing people doing different tasks. By tapping into this resource, the researchers hope to do for imitation learning what GPT-3 did for large language models. “In the last few years we’ve seen the rise of this GPT-3 paradigm where we see amazing capabilities come from big models trained on enormous swathes of the internet,” says Bowen Baker at OpenAI, one of the team behind the new Minecraft bot. “A large part of that is because we’re modeling what humans do when they go online.”

The problem with existing approaches to imitation learning is that video demonstrations need to be labeled at each step: doing this action makes this happen, doing that action makes that happen, and so on. Annotating by hand in this way is a lot of work, and so such datasets tend to be small. Baker and his colleagues wanted to find a way to turn the millions of videos that are available online into a new dataset.

The team’s approach, called Video Pre-Training (VPT), gets around the bottleneck in imitation learning by training another neural network to label videos automatically. They first hired crowdworkers to play Minecraft, and recorded their keyboard and mouse clicks alongside the video from their screens. This gave the researchers 2000 hours of annotated Minecraft play, which they used to train a model to match actions to onscreen outcome. Clicking a mouse button in a certain situation makes the character swing its axe, for example.  

The next step was to use this model to generate action labels for 70,000 hours of unlabelled video taken from the internet and then train the Minecraft bot on this larger dataset.

“Video is a training resource with a lot of potential,” says Peter Stone, executive director of Sony AI America, who has previously worked on imitation learning. 


The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained how to perform tasks by watching humans do them. Imitation learning can be used to train AI to control robot arms, drive cars or navigate webpages.  

There is a vast amount of video online showing people doing different tasks. By tapping into this resource, the researchers hope to do for imitation learning what GPT-3 did for large language models. “In the last few years we’ve seen the rise of this GPT-3 paradigm where we see amazing capabilities come from big models trained on enormous swathes of the internet,” says Bowen Baker at OpenAI, one of the team behind the new Minecraft bot. “A large part of that is because we’re modeling what humans do when they go online.”

The problem with existing approaches to imitation learning is that video demonstrations need to be labeled at each step: doing this action makes this happen, doing that action makes that happen, and so on. Annotating by hand in this way is a lot of work, and so such datasets tend to be small. Baker and his colleagues wanted to find a way to turn the millions of videos that are available online into a new dataset.

The team’s approach, called Video Pre-Training (VPT), gets around the bottleneck in imitation learning by training another neural network to label videos automatically. They first hired crowdworkers to play Minecraft, and recorded their keyboard and mouse clicks alongside the video from their screens. This gave the researchers 2000 hours of annotated Minecraft play, which they used to train a model to match actions to onscreen outcome. Clicking a mouse button in a certain situation makes the character swing its axe, for example.  

The next step was to use this model to generate action labels for 70,000 hours of unlabelled video taken from the internet and then train the Minecraft bot on this larger dataset.

“Video is a training resource with a lot of potential,” says Peter Stone, executive director of Sony AI America, who has previously worked on imitation learning. 

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment