DeepMind’s Genie 2 can generate interactive worlds that look like video games

December 4, 2024

141

DeepMind, Google’s AI research org, has unveiled a model that can generate an “endless” variety of playable 3D worlds.

Called Genie 2, the model — the successor to DeepMind’s Genie, which was released earlier this year — can generate an interactive, real-time scene from a single image and text description (e.g. “A cute humanoid robot in the woods”). In this way, it’s similar to models under development by Fei-Fei Li’s company, World Labs, and Israeli startup Decart.

DeepMind claims that Genie 2 can generate a “vast diversity of rich 3D worlds,” including worlds in which users can take actions like jumping and swimming by using a mouse or keyboard. Trained on videos, the model’s able to simulate object interactions, animations, lighting, physics, reflections, and the behavior of “NPCs.”

DeepMind Genie 2 — **Image Credits:**DeepMind

Many of Genie 2’s simulations look like AAA video games — and the reason could well be that the model’s training data contains playthroughs of popular titles. But DeepMind, like many AI labs, wouldn’t reveal many details about its data sourcing methods, for competitive reasons or otherwise.

One wonders about the IP implications. DeepMind — being a Google subsidiary — has unfettered access to YouTube, and Google has previously implied that its ToS gives it permission to use YouTube videos for model training. But is Genie 2 basically creating unauthorized copies of the video games it “watched”? That’s for the courts to decide.

DeepMind says that Genie 2 can generate consistent worlds with different perspectives, like first-person and isometric views, for up to a minute, with the majority lasting 10-20 seconds.

“Genie 2 responds intelligently to actions taken by pressing keys on a keyboard, identifying the character and moving it correctly,” DeepMind wrote in a blog post. “For example, our model [can] figure out that arrow keys should move a robot and not trees or clouds.”

Most models like Genie 2 — world models, if you will — can simulate games and 3D environments, but with artifacting, consistency, and hallucinatory issues. For example, Decart’s Minecraft simulator, Oasis, has a low resolution and quickly “forgets” the layout of levels.

Genie 2, however, can remember parts of a simulated scene that aren’t in view and render them accurately when they become visible again, DeepMind claims. (World Labs’ models can do this too.)

Now, games created with Genie 2 wouldn’t be all that fun, really. Having your progress erased every minute would drive anyone up the wall. So DeepMind’s positioning the model as more of a research and creative tool — a tool for prototyping “interactive experiences” and evaluating AI agents.

“Thanks to Genie 2’s out-of-distribution generalization capabilities, concept art and drawings can be turned into fully interactive environments,” DeepMind wrote. “And by using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can generate evaluation tasks that agents have not seen during training.”

DeepMind says that while Genie 2 is in the early stages, the lab believes it’ll be a key component in developing AI agents of the future.

Google has poured increasing resources into world models, which promise to be the next big thing in AI. In October, DeepMind hired Tim Brooks, who was heading development on OpenAI’s Sora video generator, to work on video generation technologies and world simulators.

DeepMind’s Genie 2 can generate interactive worlds that look like video games

Chinese AI startup Zhipu says it is limiting GLM Coding Plan access after strong demand, taking only 20% of its current daily new subscriptions...

Payments using facial recognition are growing in South Korea as the tech improves; South Korean fintech app Toss has scaled its “Facepay” service to...

Bengaluru-based Amagi, an ad tools provider for broadcast and streaming TV, fell below its listing price in its India market debut after raising $196M...

Most Popular

Ferrari F1 partners with WHOOP to drive team performance

AI Tutors, With A Little Human Help, Offer ‘Reliable’ Instruction, Study Finds –

Lamar Odom Had Bloodshot Eyes, Vehicle Smelled Like Marijuana During DUI Arrest, Cops Say

Magliano Debuts at Paris Fashion Week, Embracing Bold New Horizons

Recent Comments

ABOUT US

POPULAR POSTS

Ferrari F1 partners with WHOOP to drive team performance

AI Tutors, With A Little Human Help, Offer ‘Reliable’ Instruction, Study Finds –

Lamar Odom Had Bloodshot Eyes, Vehicle Smelled Like Marijuana During DUI Arrest, Cops Say

POPULAR CATEGORY