Noam Brown, who leads AI reasoning research at OpenAI, says certain forms of “reasoning” AI models could’ve arrived 20 years earlier had researchers “known [the right] approach” and algorithms.
“There were various reasons why this research direction was neglected,” Brown said during a panel at Nvidia’s GTC conference in San Jose on Wednesday. “I noticed over the course of my research that, OK, there’s something missing. Humans spend a lot of time thinking before they act in a tough situation. Maybe this would be very useful [in AI].”
Brown was referring to his work on game-playing AI at Carnegie Melon University, including Pluribus, which defeated elite human professionals at poker. The AI Brown helped create was unique at the time in the sense that it “reasoned” through problems rather than attempting a more brute-force approach.
Brown is one of the architects behind o1, an OpenAI AI model that employs a technique called test-time inference to “think” before it responds to queries. Test-time inference entails applying additional computing to running models to drive a form of “reasoning.” In general, so-called reasoning models are more accurate and reliable than traditional models, particularly in domains like mathematics and science.
Brown was asked during the panel whether academia could ever hope to perform experiments on the scale of AI labs like OpenAI, given institutions’ general lack of access to computing resources. He admitted that it’s become tougher in recent years as models have become more computing-intensive, but that academics can make an impact by exploring areas that require less computing, like model architecture design.
“[T]here is an opportunity for collaboration between the frontier labs [and academia],” Brown said. “Certainly, the frontier labs are looking at academic publications and thinking carefully about, OK, does this make a compelling argument that, if this were scaled up further, it would be very effective. If there is that compelling argument from the paper, you know, we will investigate that in these labs.”
Brown’s comments come at a time when the Trump administration is making deep cuts to scientific grant-making. AI experts including Nobel Laureate Geoffrey Hinton have criticized these cuts, saying that they may threaten AI research efforts both domestic and abroad.
Brown called out AI benchmarking as an area where academia could make a significant impact. “The state of benchmarks in AI is really bad, and that doesn’t require a lot of compute to do,” he said.
As we’ve written about before, popular AI benchmarks today tend to test for esoteric knowledge, and give scores that correlate poorly to proficiency on tasks that most people care about. That’s led to widespread confusion about models’ capabilities and improvements.
Updated 4:06 p.m. Pacific: An earlier version of this piece implied that Brown was referring to reasoning models like o1 in his initial remarks. In fact, he was referring to his work on game-playing AI prior to his time at OpenAI. We regret the error.