It's getting harder to tell the difference between human and AI speech

Publicly released:
International
Photo by Oleg Ivanov on Unsplash
Photo by Oleg Ivanov on Unsplash

Humans struggle to tell the difference between the speech of fellow humans and quality AI-generated voices, according to international researchers, who say it's a skill we may all be able to learn. The researchers recruited 30 people and played them snippets of human and AI-generated voices, asking them to guess which they were. They say the participants struggled to distinguish between AI and human voices, and after a 12-minute training session they only improved marginally. However, the team was monitoring the particpants' brain activity, and say that after the short training session, brain responses to the AI voices began to change, suggesting the participants were beginning to process the AI voices differently.

News release

From: Society for Neuroscience

Can people distinguish between AI-generated and human speech?

While listeners struggle to distinguish AI-generated versus human speech, their brains rapidly adapt to subtle differences between the two types of sound after short training.

In a collaboration between Tianjin University and the Chinese University of Hong Kong, researchers led by Xiangbin Teng used behavioral and brain activity measures to explore whether people can discern between AI-generated and human speech. The researchers also assessed whether brief training improves this ability.

Thirty participants listened to sentences spoken by people or AI-generated voices and judged whether the speakers were human or AI before and after short training. The researchers discovered that study participants were bad at discriminating between the two types of speakers, and that training helped only minimally. However, on a neural level, training made the brain’s responses more distinct for human versus AI speech. What might that mean? Says Teng, “The auditory brain system seems to start picking up subtle acoustic differences, even if people can’t reliably turn that into a behavioral decision yet. That’s encouraging—it suggests training can help, and it’s a promising starting point for building better ways to distinguish deepfake speech from real human speech. Humans are still adapting to AI-generated content so poor performance doesn’t mean the signals aren’t there—it may mean we’re not yet using the right cues.”

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research Society for Neuroscience, Web page The URL will go live after the embargo ends
Journal/
conference:
eNeuro
Research:Paper
Organisation/s: Tianjin University, China, Chinese University of Hong Kong, China
Funder: This work was supported by Improvement on Competitiveness in Hiring New Faculties Funding Scheme, the Chinese University of Hong Kong (4937113), to X.T.
Media Contact/s
Contact details are only visible to registered journalists.