This study is part of a growing body of research warning about the risks of deploying AI agents in real-world financial decision-making. Earlier this month, a group of researchers from multiple universities argued that LLM agents should be evaluated primarily on the basis of their risk profiles, not just their peak performance. Current benchmarks, they say, emphasize accuracy and return-based metrics, which measure how well an agent can perform at its best but overlook how safely it can fail. Their research also found that even top-performing models are more likely to break down under adversarial conditions.
The team suggests that in the context of real-world finances, a tiny weakness—even a 1% failure rate—could expose the system to systemic risks. They recommend that AI agents be “stress tested” before being put into practical use.
Hancheng Cao, an incoming assistant professor at Emory University, notes that the price negotiation study has limitations. “The experiments were conducted in simulated environments that may not fully capture the complexity of real-world negotiations or user behavior,” says Cao.
Pei, the researcher, says researchers and industry practitioners are experimenting with a variety of strategies to reduce these risks. These include refining the prompts given to AI agents, enabling agents to use external tools or code to make better decisions, coordinating multiple models to double-check each other’s work, and fine-tuning models on domain-specific financial data—all of which have shown promise in improving performance.
Many prominent AI shopping tools are currently limited to product recommendation. In April, for example, Amazon launched “Buy for Me,” an AI agent that helps customers find and buy products from other brands’ sites if Amazon doesn’t sell them directly.
While price negotiation is rare in consumer e-commerce, it’s more common in business-to-business transactions. Alibaba.com has rolled out a sourcing assistant called Accio, built on its open-source Qwen models, that helps businesses find suppliers and research products. The company told MIT Technology Review it has no plans to automate price bargaining so far, citing high risk.
That may be a wise move. For now, Pei advises consumers to treat AI shopping assistants as helpful tools—not stand-ins for humans in decision-making.
“I don’t think we are fully ready to delegate our decisions to AI shopping agents,” he says. “So maybe just use it as an information tool, not a negotiator.”
Correction: We removed a line about agent deployment