Could AI help us detect when AI is lying?

Publicly released:
Australia; International; VIC
Image by Franz Bachinger from Pixabay
Image by Franz Bachinger from Pixabay

Large Language Models (LLMs), such as ChatGPT or Gemini can sometimes make up information, a problem called 'hallucinations', but now researchers think one solution may be using similar large language models to detect these errors, like fighting fire with fire. The researchers looked at a specific type of hallucination, caused by a lack of knowledge, and looked at the nuance of language and how responses can be expressed in different ways to try and work out how likely it was that the generated content was correct. In an accompanying News &Views article, an Australian expert warns that “using an LLM to evaluate an LLM-based method does seem circular, and might be biased.” But the authors believe that their method may help people understand when they should take care in relying on LLM responses.

Media release

From: Springer Nature

1.  Artificial intelligence: Detecting hallucinations in large language models (N&V)

A method for detecting hallucinations in large language models (LLMs) that measures uncertainty in the meaning of generated responses is presented in Nature this week. The approach could be used to improve the reliability of LLM output.

LLMs, such as ChatGPT and Gemini, are artificial intelligence systems that can read and generate natural human language. However, such systems can be prone to hallucinations, in which generated content is either inaccurate or nonsensical. Detecting the extent to which an LLM may hallucinate is challenging as the responses may seem plausible in the way that they are presented.

Sebastian Farquhar and colleagues attempt to quantify the degree of hallucinations generated by an LLM and thus determine how true to the provided source content generated content might be. Their method detects a specific subclass of hallucinations called confabulations, which are inaccurate and arbitrary and often occur when there is a lack of knowledge in the LLM. The approach considers the nuance of language and how responses can be expressed in different ways, which may have different meanings. The authors show that their method can detect confabulations in LLM-generated biographies and in answers to questions on topics such as trivia, general knowledge and life sciences.

The task is performed by an LLM and is evaluated by a third LLM, which amounts to “fighting fire with fire,” notes Karin Verspoor in an accompanying News & Views article. She adds “using an LLM to evaluate an LLM-based method does seem circular, and might be biased.” However, the authors propose that their method may help users to understand when they should take care in relying on LLM responses and may mean that LLMs could be used with more confidence in a broader range of applications.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research Springer Nature, Web page Please link to the article in online versions of your report (the URL will go live after the embargo ends).
Journal/
conference:
Nature
Research:Paper
Organisation/s: University of Oxford,UK, RMIT University (News & Views Author)
Funder: Y.G. is supported by a Turing AI Fellowship funded by the UK government’s Office for AI, through UK Research and Innovation (grant reference EP/V030302/1), and delivered by the Alan Turing Institute.
Media Contact/s
Contact details are only visible to registered journalists.