You have full access to this article via your institution.

The team that built the FIBHE data set asked participants for their consent and compensated them for their images — something that doesn’t happen when AI tools just ‘scrape’ information from the Internet.Credit: Reka Olga/Getty
It’s a truth almost universally acknowledged that widely used generative artificial-intelligence applications were built with data collected from the Internet. This was done, for the most part, without obtaining people’s informed consent and without compensating the individuals whose data were ‘scraped’ in this way.
But a research article now shows that, when it comes to images, another way is possible. Researchers at the global technology and entertainment giant Sony describe a data set of responsibly sourced images that can be used to benchmark the accuracy of generative AI (A. Xiang et al. Nature https://doi.org/10.1038/s41586-025-09716-2; 2025). The work was complex, yet it didn’t cost the Earth. The price tag for data collection — less than US$1 million — is a drop in the ocean for many technology firms.
Read the paper: Fair human-centric image dataset for ethical AI benchmarking
Regulators and funders need to take note. So should all those involved in litigation relating to whether scraping people’s data — in any form — to train and test generative-AI models is permissible. Creating responsibly sourced and representative data is possible when consent and accuracy concerns are addressed explicitly.
There’s an important message for corporations, too: here is an opportunity for companies to work together for everyone’s benefit. There are times when firms need to compete and times when they must collaborate. In these pages, we often make the case for improved collaboration. If there was ever an example of why such partnerships are needed, this is it.
There’s little doubt that personal, sometimes identifiable, digital information has been used to build generative AI applications. Such data include material from blogs and content on social-media platforms, images and videos that often include people, and copyrighted works such as paintings and sculptures, books, music and films.
Don’t sleepwalk from computer-vision research into surveillance
Most countries have laws governing data collection (T. Kuru Int. Data Priv. Law 14, 326–351; 2024). These laws include the need to obtain permission to protect people’s privacy and intellectual-property rights. Those permissions often require those collecting data to explain what the data will be used for, include the ability to opt out and, when appropriate, compensate the people who provide data. Despite this, the companies developing some of the largest of the publicly available large language models have not routinely followed this practice. In some cases, firms have argued that consent isn’t needed if someone has already made their material available on the Internet, and that what they are doing constitutes ‘fair use’ of publicly available data. This is a controversial contention and is being questioned by regulatory bodies and organizations that represent copyright holders, such as writers and artists.
This is where the fresh data set — called the Fair Human-Centric Image Benchmark (FHIBE) or ‘Feebee’ — is different. Alice Xiang, Sony’s global head of AI governance, and her colleagues obtained informed consent for the data set’s 10,318 images of 1,981 individuals from 81 countries. Each individual was told in accessible language what data were needed and how they could be used — applications involving law enforcement, the military, arms and surveillance are explicitly prohibited under the terms of use. Participants were paid for their material and can opt out at any time.
A shout-out for AI studies that don’t make the headlines
FHIBE also differs from existing image data sets in another important respect: it includes a much greater proportion of people and photographs from countries in Africa, Asia and Oceania. Moreover, in the FHIBE data set, participants provided their age, ancestry, geographical location and pronouns, removing the need for an algorithm to guess these characteristics from someone’s name or appearance. This is important because it means that the FHIBE data set is a more accurate reflection of the real world than are the many lopsided ones assembled from web-scraped data.
As well as being an important proof of concept, this study provides a way for companies to benchmark the accuracy of existing AI image applications. Researchers should also take the opportunity to use it to investigate some big and as-yet unanswered questions. For example, could a similar data set be made for benchmarking the accuracy of text-based AI tools? How can responsibly sourced data be produced on the scale needed to train, not just benchmark, large language models, and what should that scale be?
Xiang and her research team have shown how to produce and test responsible AI systems. They have chosen a tough problem, but this should not be their fight alone. Others must join the effort so we can build AI applications according to the highest standards of accuracy and ethics.




