
A machine running the AI model Gemini Robotics places a basketball in a hoop.Credit: Google DeepMind
Artificial-intelligence company Google DeepMind has put a version of its most advanced large language model (LLM), Gemini, into robots. Using the model, machines can perform some tasks — such as ‘slam dunking’ a miniature basketball through a desktop hoop — despite never having watched another robot do the action, says the firm.
The company is among several working to harness the artificial intelligence (AI) advances that power chatbots to create general-purpose robots. The approach also comes with safety concerns, given such models’ propensity to generate wrong and harmful outputs.
The hope is to create machines that are intuitive to operate and can tackle a range of physical tasks, without relying on human supervision or being preprogrammed. By connecting to Gemini’s robotic models, a developer could enhance their robot so that it comprehends “natural language and now understands the physical world in a lot more detail than before,” says Carolina Parada, who leads the Google DeepMind robotics team and is based in Boulder, Colorado.
The model known as Gemini Robotics — announced on 12 March in a blog post and technical paper — is “a small but tangible step” towards that goal, says Alexander Khazatsky, an AI researcher and co-founder of CollectedAI in Berkeley, California, which is focused on creating data sets to develop AI-powered robots.
Spatial awareness
A team at Google DeepMind, which is headquartered in London, started with Gemini 2.0, the firm’s most advanced vision and language model, trained by analysing patterns in huge volumes of data.
They created a specialized version of the model designed to excel at reasoning tasks involving 3D physical and spatial understanding — for example, predicting an object’s trajectory or identifying the same part of an object in images taken from different angles.
Finally, they further trained the model on data from thousands of hours of real, remote-operated robot demonstrations. This allowed the robotic ‘brain’ to implement real actions, much in the way LLMs use their learned associations to generate the next word in a sentence.
The team tested Gemini Robotics on humanoid robots and robotic arms, on tasks that came up in training and on unfamiliar activities. According to the team, robots using the model consistently outperformed state-of-the-art rivals when tested on new tasks and familiar ones in which details had been changed.