
Chemical synthesis is the process of creating complex chemical compounds from simpler precursors.Credit: Andrew Lambert Photography/SPL
Searching for blockbuster drugs and wonder materials is an arduous task for chemists. To make their promising compounds, they must trawl through millions of known chemical reactions, with hundreds of thousands more added annually, and then test whether it is possible to synthesize them.
Now, researchers have created an artificial-intelligence system that vastly simplifies and accelerates the process of chemical synthesis. The system, which is called MOSAIC and is described in a study published in Nature on 19 January1, recommended conditions that researchers were able to use to generate 35 compounds with the potential to become products like pharmaceuticals, agrochemicals or cosmetics without needing to do any further trawling or tweaking.
“The synthesis of small molecules is the slow step in drug discovery and a number of other important areas,” says study co-author Timothy Newhouse, a chemist at Yale University in New Haven, Connecticut.
MOSAIC could remove this bottleneck, adds Newhouse, so could lead to more and better products. It is “capable of drafting complete laboratory instructions — detailed enough for chemists to follow — to help create molecules that have not previously existed”.
AI-assisted chemistry
Predicting the conditions of chemical reactions has been a key focus of AI use in chemistry. One of the most prominent tools is IBM’s RXN for Chemistry, which is based on a large language model (LLM).
It uses a system called simplified molecular-input line-entry system (SMILES). This translates chemical 3D structures into letters, numbers and punctuation, which are better suited to a system that recognizes language. By contrast, LLMs such as ChemCrow are trained for chemistry tasks using natural-language data2.
AI chatbot shows surprising talent for predicting chemical properties and reactions
The SMILES approach makes it easier to process chemical information such as starting materials and solvents. “Our goal was to build a general model that could read chemistry the way chemists write it [by] listening to the language of experimental procedures and quickly turning that collective voice into a practical suggestion,” says Newhouse. Newhouse adds that integrating the step-by-step instructions that MOSAIC produces into automated systems would be a “natural next step”.
The researchers used an AI system they had developed previously3 to cluster a database of around one million reactions extracted from patents into 2,285 subsets. Using the subsets, the team trained Meta’s partially open-source Llama LLM to create 2,498 separate expert models, each specialized in one combination of chemical transformation starting from one type of molecule. This approach can run on computers locally because it uses fewer parameters than do the major LLMs.
Martin Seifrid, a materials scientist at North Carolina State University in Raleigh, says that MOSAIC is notable in that it avoids “throwing the largest possible model at a problem, instead choosing to focus on a carefully designed system of much smaller ‘expert’ models”. “Each specialized model is more accurate within its domain,” Seifrid says.


