Can we 'watermark' AI-generated text so we know where it came from?

Publicly released:
International
Photo by Mariia Shalabaieva on Unsplash
Photo by Mariia Shalabaieva on Unsplash

Google researchers say they have developed a system that can watermark AI-generated text, with a computer model able to tell the watermarked text apart from human writing.  The team say their strategy uses an algorithm to subtly bias the word choices of a chatbot in a way a computer model can pick up later to identify the text as AI-generated. The researchers developed two paths for their technique - one where the watermark is more easily identifiable but the text is of lower quality and another path that improves text quality but is still more detectable than existing watermark approaches, they say. The team say this technique doesn't require too much computer power to run, and while their system could be deliberately circumvented by people paraphrasing the content, it is a step forward when it comes to AI transparency.

Media release

From: Springer Nature

Artificial intelligence: Watermarks for AI-generated text

A tool that can watermark text generated by large language models, improving the ability for it to identify and trace synthetic content, is described in Nature this week.

Large language models (LLMs) are widely used artificial intelligence (AI) tools that can generate text for chatbots, writing support and other purposes. However, it can be difficult to identify and attribute AI-generated text to a specific source, putting the reliability of the information into question. Watermarks have been proposed as a solution to this problem, but have not been deployed at scale because of stringent quality and computational efficiency requirements in production systems.

Sumanth Dathathri, Pushmeet Kohli and colleagues developed a scheme that uses a novel sampling algorithm to apply watermarks to AI-generated text, known as SynthID-Text. The tool uses a sampling algorithm to subtly bias the word choice of the LLM, inserting a signature that can be recognized by the associated detection software. This can either be done via a ‘distortionary’ pathway, which improves the watermark at a slight cost of output quality, or a ‘non-distortionary’ pathway, which preserves text quality.

The detectability of these watermarks was evaluated across several publicly available models, with SynthID-Text showing improved detectability compared to existing approaches. The quality of the text was also assessed using nearly 20 million responses from live chat interactions using the Gemini LLM, with results suggesting that the non-distortionary mode of watermarking did not decrease the text quality. Finally, the use of SynthID-Text has a negligible impact on the computational power needed to run the LLM, reducing the barrier to implementation.

The authors caution that text watermarks can be circumvented by editing or paraphrasing the output. However, this work shows viability for a tool that can produce generative text watermarks for AI-generated content, in a further step to improving the accountability and transparency of responsible LLM use.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research Springer Nature, Web page The URL will go live after the embargo ends
Journal/
conference:
Nature
Research:Paper
Organisation/s: Google DeepMind, UK
Funder: We thank N. Shabat, N. Dal Santo, V. Anklin and B. Hekman for their collaboration on product integration; A. Senoner, E. Hirst, P. Kirk, M. Trebacz and many others who contributed across Google DeepMind and Google, including our partners at Gemini and CoreML, for their support in bringing this technology to production; D. Stutz for technical inputs on the selective prediction mechanism; R. Mullins for helping with the open-sourcing of the work; and M. Raykova for feedback on the paper.
Media Contact/s
Contact details are only visible to registered journalists.