Reliance on AI search engines for medical information risky

"Chatbot answers were largely difficult to read and answers repeatedly lacked information or showed inaccuracies, possibly threatening patient and medication safety." Authors of paper in BMJ Quality & Safety

Patients should not rely on search engines and chatbots powered by AI for information, according to a new study.

The study, published in the journal BMJ Quality & Safety, found that a large number of answers provided to searches about drug treatments are inaccurate and potentially harmful. Some of the answers are also too complex to be easily understood.

In February last year, search engines began to use AI-powered chatbots, which promised to improve search results and provide more comprehensive answers. These chatbots, which are trained on internet datasets, are capable of generating disinformation and content that is nonsensical or harmful, the researchers say.

The researchers, who are based at universities in Germany and Belgium, explored the readability, completeness and accuracy of chatbot answers for queries on the top 50 most frequently prescribed drugs in the US in 2020, using Bing copilot, a search engine with AI-powered chatbot features. They reviewed research databases and consulted doctors with expertise in pharmacology to identify the medication questions that patients most frequently ask health care professionals.

The chatbot was asked 10 questions for each of the 50 drugs, generating 500 answers. The questions covered what the drug was used for, how it worked, instructions for use, common side effects and contraindications.

Experts assessed likelihood of harm

Readability of the answers provided by the chatbot was assessed by calculating the Flesch Reading Ease Score, which estimates the educational level required to understand a particular text. This scale ranges between 0 to 100, where the lower scores are considered difficult to read, and scores of 91 or above are considered easy.

To assess the completeness and accuracy of chatbot answers, the researchers compared the responses with the drug information provided by drugs.com, a peer-reviewed and up-to-date drug information website for both health care professionals and patients.

Seven medication experts then assessed the likelihood of possible harm if the patient followed the chatbot’s recommendations, using a subset of 20 chatbot answers displaying low accuracy or completeness, or a potential risk to patient safety.

They used harm scales produced by the Agency for Healthcare Research and Quality (AHRQ) to rate patient safety events. The likelihood of possible harm was estimated by the experts in accordance with a validated framework.

The average Flesch Reading Ease Score was just over 37, which meant that the answers would be difficult to understand for many readers.

Of the 10 questions asked for each medication, on average five were answered with the highest completeness, while question 3 (What do I have to consider when taking the drug?) was answered with the lowest average completeness of only 23%.

Chatbot statements didn’t match the reference data in 26% of answers, and were fully inconsistent in 3%.

Some answers don’t align with scientific consensus

Evaluation of the subset of 20 answers revealed that only 54% were rated as aligning with scientific consensus. More than a third (39%) contradicted the scientific consensus, while there was no established scientific consensus for the remaining 6%.

Possible harm resulting from a patient following the chatbot’s advice was rated as highly likely in 3% and moderately likely in 29% of these answers, while a third were judged as either unlikely or not at all likely to result in harm, if followed.

“In this cross-sectional study, we observed that search engines with an AI-powered chatbot produced overall complete and accurate answers to patient questions,” the researchers write.

“However, chatbot answers were largely difficult to read and answers repeatedly lacked information or showed inaccuracies, possibly threatening patient and medication safety,” they add.

A major drawback was the chatbot’s inability to understand the underlying intention behind a patient question, they argue, adding: “Despite their potential, it is still crucial for patients to consult their health care professionals, as chatbots may not always generate error-free information. Caution is advised in recommending AI-powered search engines until citation engines with higher accuracy rates are available.”

FCC Insight

Artificial intelligence (AI) has huge potential to bring about improvements in health care, whether that is in managing waiting lists more effectively, speeding up diagnosis or identifying the best treatment. But the use of AI-driven chatbots to serve up health information in response to online searches is a different matter, and it is worrying that researchers have found that so many of the results thrown up by search queries are either difficult to understand or inaccurate – and in some cases downright harmful. Patients need to be alert to the problems involved in consulting search engines for health-related information, and to understand that it is still better to talk to health professionals about any questions they have relating to medication.

Reliance on AI search engines for medication information creates risk

Experts assessed likelihood of harm

Some answers don’t align with scientific consensus

Share: