‘Waldo’ AI searches social media for health product side effects

Publicly released:
International
PHOTO: Ralph Olazo on Unsplash
PHOTO: Ralph Olazo on Unsplash

A new AI tool called Waldo can search social media for health product side effects that might otherwise never get reported. US researchers trained Waldo to find Reddit posts talking about side effects of publicly available cannabis-based products. When tested on posts that had already been looked at by humans, Waldo got it right 99.7% of the time, while ChatGPT made more mistakes. The researchers made Waldo free for anyone to use, saying it must be used as a screening tool with human oversight but could help doctors and health officials find out whether over-the-counter health products, like supplements, are actually safe to use.

Media release

From: PLOS

New AI tool scans social media for hidden health risks

An artificial intelligence system called Waldo can spot personal reports of the harmful side effects of popular health products

A new artificial intelligence tool can scan social media data to discover adverse events associated with consumer health products, according to a study published September 30th in the open-access journal PLOS Digital Health by John Ayers of the University of California, San Diego, U.S., and colleagues.

The constant post-market surveillance of the safety of consumer products is crucial for public health and safety. However, current adverse-event reporting systems for approved prescription medications and medical devices depend on voluntary submissions from doctors and manufactures to the U.S. Food and Drug Administration.

The rapid growth in consumer health products, such as cannabis-derived products and dietary supplements, has led to the need for new adverse event detection systems.

In the new study, researchers tested the efficacy of a new automated machine learning tool, “Waldo,” that can sift through social media text to find consumer descriptions of adverse events. The tool was tested on its ability to scan Reddit posts to find adverse events (AEs) of cannabis-derived products.

When compared to human AE annotations of a set of Reddit posts, Waldo had an accuracy of 99.7%, far better than a general-purpose ChatGPT chatbot that was given the same set of posts. In a broader dataset of 437,132 Reddit posts, Waldo identified 28,832 potential reports of harm. When the researchers manually validated a random sample of these posts, they found that 86% were true AEs. The team has made Waldo open-source so that anyone—researchers, clinicians, or regulators—can use it.

“Waldo represents a significant advancement in social media-based AE detection, achieving superior performance compared to existing approaches,” the authors say.

“Additionally, Waldo's automated approach has broad applicability beyond cannabis-derived products to other consumer health products that similarly lack regulatory oversight.”

Lead author Karan Desai says, “Waldo shows that the health experiences people share online are not just noise, they’re valuable safety signals. By capturing these voices, we can surface real-world harms that are invisible to traditional reporting systems.”

John Ayers adds, “This project highlights how digital health tools can transform post-market surveillance. By making Waldo open-source, we’re ensuring that anyone, from regulators to clinicians, can use it to protect patients.”

Second author Vijay Tiyyala notes, “From a technical perspective, we demonstrated that a carefully trained model like RoBERTa can outperform state-of-the-art chatbots for AE detection. Waldo’s accuracy was surprising and encouraging.”

“By democratizing access to Waldo, the team hopes to accelerate open science and improve safety for patients.”

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research PLOS, Web page URL will go live after embargo ends.
Journal/
conference:
PLOS Digital Health
Research:Paper
Organisation/s: University of Michigan, USA.
Funder: The author(s) received no specific funding for this work.
Media Contact/s
Contact details are only visible to registered journalists.