
Credit: Jonathan Raa/NurPhoto via Getty
Groups of large language models playing simple interactive games can develop social norms, such as adopting their own rules for how language is used, according to a study1 published this week in Science Advances.
Social conventions such as greeting a person by shaking their hand or bowing represent the “basic building blocks of any coordinated society”, says co-author Andrea Baronchelli at City St George’s, University of London, who studies how people behave in groups. Baronchelli wanted to see what happens when large language models (LLMs) interact in groups.
In the first of two experiments, his team used Claude, an LLM created by Anthropic, a start-up based in San Francisco, California, to play a naming game similar to one used in studies of group dynamics in people. The game involves randomly pairing up members of a group and asking them to name an object, with a financial incentive if they provide the same name as their partner and a punishment if they don’t. After repeating this over several rounds and continuing to randomize partners, group members start to give the same name for the object. This naming convergence represents the creation of a social norm.
In the study, the team set up 24 copies of Claude and then randomly paired two copies together, instructing each member of the pair to select a letter from a pool of 10 options. The models were rewarded if they chose the same letter as their partner, and penalized if they didn’t. After several rounds of the game, with new partners each time, pairs began selecting the same letter.
Collective bias
This behaviour was observed when the game was repeated with a group of 200 copies of Claude and a pool of up to 26 letters. Similar results also occurred when the experiments were repeated on three versions of Llama, an LLM created by Meta in Menlo Park, California.
Although the models chose letters at random when operating individually, they became more likely to choose some letters over others when grouped, suggesting they had developed a collective bias. In people, collective bias refers to beliefs or assumptions that emerge when people interact with each other.
Baronchelli was surprised by this finding. “This phenomenon, to the best of our knowledge, has not been documented before in AI systems,” he adds.
The formation of collective biases could result in harmful biases, Baronchelli says, even if individual agents seem unbiased. He and his colleagues suggest that LLMs need to be tested in groups to improve their behaviour, which would complement work by other researchers to reduce biases in individual models.