Chatbots make inconsistent moral judgements

Publicly released:
International
Illustration by Mohamed Hassan on Unsplash
Illustration by Mohamed Hassan on Unsplash

Researchers presented large language models with moral dilemmas a self-driving car might encounter, and prompted them to choose the better of two options—for example, hitting and killing pedestrians, or swerving into a barrier and killing the car's occupants. However, they found small changes to prompts, like labelling options with letters instead of numbers, could result in the chatbots choosing differently. The authors say previous research to identify chatbots' moral biases treated them as having moral values, like humans, but this study shows their behaviour is fundamentally different. They suggest future research should assess the reliability of large language models before trying to understand their behaviour.

Media release

From: The Royal Society

Although large language models (LLMs) have been explored for their ability in moral reasoning, their sensitivity to prompt variations undermines result reliability. The current study shows that LLMs responses in complex moral reasoning tasks are highly influenced by subtle wording changes, such as labeling options as 'Case 1' versus '(A)'. These findings imply that previous conclusions about LLMs' moral reasoning may be flawed due to task design artifacts. We recommend introducing a rigorous evaluation framework that includes prompt variation and counterbalancing in the dataset.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research The Royal Society, Web page URL will go live after the embargo lifts
Journal/
conference:
Royal Society Open Science
Research:Paper
Organisation/s: Saarland University, Germany
Funder: Funding was provided by European Research Council.
Media Contact/s
Contact details are only visible to registered journalists.