New photo dataset aims to help spot AI biases responsibly

Publicly released:
International
PHOTO: Proxyclick Visitor Management System/Unsplash
PHOTO: Proxyclick Visitor Management System/Unsplash

A new, publicly available dataset of more than 10,000 human images aims to help spot biases in AI, according to new research. Many AI models used in computer vision (like self-driving cars or facial recognition) were developed using flawed datasets that may have been collected without consent. Furthermore, the AI models themselves have been known to reflect biases that perpetuate harmful stereotypes. This new dataset, developed and funded by Sony AI, includes almost 2000 people from 81 distinct countries or regions. Participants were given detailed information about the project and potential risks to help them provide informed consent. The team says it hopes this creates a new standard for responsibly curated data for AI.

Media release

From: Springer Nature

Artificial Intelligence: Towards fairer human image datasets

A database of more than 10,000 human images to evaluate biases in artificial intelligence (AI) models for human-centric computer vision is presented in Nature this week. The Fair Human-Centric Image Benchmark (FHIBE), developed by Sony AI, is an ethically sourced, consent-based dataset that can be used to evaluate human-centric computer vision tasks to identify and correct biases and stereotypes.

Computer vision covers a range of applications, from autonomous vehicles to facial recognition technology. Many AI models used in computer vision were developed using flawed datasets that may have been collected without consent, often taken from large-scale image scraping from the web. AI models have also been known to reflect biases that may perpetuate sexist, racist, or other stereotypes.

Alice Xiang and colleagues present an image dataset that implements best practices for a number of factors, including consent, diversity, and privacy. FHIBE includes 10,318 images of 1,981 people from 81 distinct countries or regions. The database includes comprehensive annotations of demographic and physical attributes, including age, pronoun category, ancestry, and hair and skin colour. Participants were given detailed information about the project and potential risks to help them provide informed consent, which complies with comprehensive data protection laws. These features make the database a reliable resource for evaluating bias in AI responsibly.

The authors compare FHIBE against 27 existing datasets used in human-centric computer vision applications and find that FHIBE sets a higher standard for diversity and robust consent for AI evaluation. It also has effective bias mitigation, containing more self-reported annotations about the participants than other datasets, and includes a notable proportion of commonly underrepresented individuals. The dataset can be used to evaluate existing AI models for computer vision tasks and can uncover a wider variety of biases than previously possible, the authors note. The authors acknowledge that creating the dataset was challenging and expensive but conclude that FHIBE may represent a step towards more trustworthy AI.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research Springer Nature, Web page URL will go live after the embargo ends.
Journal/
conference:
Nature
Research:Paper
Organisation/s: Sony AI, USA
Funder: This project was fully funded by Sony AI. Competing interests: Sony Group Corporation, with inventors J.T.A.A. and A.X., has a pending US patent application US20240078839A1, filed on 14 August 2023, that is currently under examination. It covers aspects of the human-centric image dataset specification and annotation techniques that were used in this paper. The same application has also been filed in Europe (application number 23761605.7, filed on 15 January 2025) and China (application number 202380024486.X, filed on 30 August 2024) and the applications are pending.
Media Contact/s
Contact details are only visible to registered journalists.