Can AI-Driven Chatbots Give Reliable Advice About Cancer?

Anne Rowe

Can AI-Driven Chatbots Give Reliable Advice About Cancer?

TON - December 2023 Vol 16, No 6

Artificial intelligence (AI) chatbots showed mixed results when it came to providing treatment strategies and direct-to-patient cancer advice for a variety of malignancies, according to 2 studies recently published in JAMA Oncology.

The results from the first study, which assessed cancer treatment recommendations, showed that AI chatbots overall missed the mark on providing recommendations for breast, prostate, and lung cancers in accordance with national treatment guidelines.¹

The findings from the second study, which evaluated responses to common cancer-related Google searches, were more positive, with the researchers reporting that the chatbots generally provided accurate information to consumers, although they noted the usefulness of the information may be limited by its complexity.²

Cancer Treatment Recommendations Study

For the first study, Danielle S. Bitterman, MD, with the Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, and colleagues created 4 prompt templates for treatment recommendations for 26 different types of cancers (for a total of 104 prompts) and investigated the validity of ChatGPT-3.5 outputs concerning diagnostic recommendations for breast, prostate, and lung cancer versus the 2021 National Comprehensive Cancer Network (NCCN) Guidelines®. Several oncologists then assessed the level of concordance between the chatbot responses and these guidelines. The researchers noted that in accordance with the Common Rule, institutional review board approval was not necessary since human participants were not involved.

The chatbot provided at least 1 guideline-concordant treatment recommendation for 98% of prompts. However, for 34.3% of prompts, it also recommended at least 1 nonconcordant treatment.

In addition, the oncologists often disagreed about whether a ChatGPT response was correct. The researchers noted the disagreement reflects the complexity of NCCN Guidelines and the fact that ChatGPT responses can be unclear or difficult to interpret.

Notably, 13% of responses included treatment strategies that were not part of any recommended treatment. These answers incorporated treatment strategies that did not appear at all in the guidelines or were nonsensical, which AI researchers refer to as “hallucinations.”

These hallucinations were primarily recommendations for localized treatment of advanced disease, targeted therapy, or immunotherapy.

Based on the findings, the investigators recommended that clinicians advise patients that AI chatbots are not a reliable source of cancer treatment information.

“The chatbot did not purport to be a medical device, and need not be held to such standards. However, patients will likely use such technologies in their self-education, which may affect shared decision making and the patient-clinician relationship. Developers should have some responsibility to distribute technologies that do not cause harm, and patients and clinicians need to be aware of these technologies’ limitations,” Dr Bitterman and colleagues concluded.

Consumer Health Information Study

For the second study, Abdo E. Kabarriti, MD, with the State University of New York Downstate Health Sciences University, Brooklyn, and colleagues analyzed the quality of responses to the top 5 most searched questions on skin, lung, breast, colorectal, and prostate cancer provided by 4 AI chatbots: ChatGPT-3.5, Perplexity (Perplexity.AI), Chatsonic (Writesonic), and Bing AI (Microsoft).

Outcomes included the quality of consumer health information based on the DISCERN instrument (a scale of 1-5, with 1 representing low quality) and the understandability and actionability of this information based on domains of the Patient Education Materials Assessment Tool (PEMAT), with scores ranging from 0% to 100%, with higher scores indicating a higher level of understandability and actionability.

The quality of text responses generated by the 4 chatbots was good (median DISCERN score of 5, with no misinformation identified). Understandability was considered moderate (median PEMAT understandability score of 66.7%) but actionability was poor (median PEMAT actionability score of 20%). Three of the 4 chatbots cited reputable sources, such as the American Cancer Society, Mayo Clinic, and Centers for Disease Control and Prevention, which they said was “reassuring.”

However, the researchers found that the usefulness of the information was “limited” because responses were often written at a college reading level. Another limitation was that the AI chatbots provided concise answers with no visual aids, which may not be sufficient to explain more complex ideas to consumers.

“The findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information,” Dr Kabarriti and colleagues concluded.

Expert Perspective

In an accompanying editorial,³ Atul Butte, MD, PhD, who heads the Bakar Computational Health Sciences Institute, University of California, San Francisco, highlighted some caveats regarding the 2 studies, including the fact that the research teams evaluated “off the shelf” chatbots, which presumably had not been trained on selected medical information, and the prompts designed in both studies were very basic, which may have limited their specificity or actionability. However, he noted that newer large language models that have had specific healthcare training are being released.

Regardless of the mixed study results, Dr Butte asserted that he is optimistic about the future of AI in medicine.

“Today, the reality is that the highest-quality care is concentrated within a few premier medical systems like the NCI Comprehensive Cancer Centers, accessible only to a small fraction of the global population. However, AI has the potential to change this,” Dr Butte wrote.

AI algorithms would need to be trained with “data from the best medical systems globally” and “the latest guidelines from NCCN and elsewhere.” Digital health platforms powered by AI could then be designed to provide resources and advice to patients around the globe, he added.

References

Chen S, Kann BH, Foote MB, et al. Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol. 2023;9:1459-1462.
Pan A, Musheyev D, Bockelman D, et al. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol. 2023;9:1437-1440.
Butte AJ. Artificial intelligence-from starting pilots to scalable privilege. JAMA Oncol. 2023;9:1341-1342.

Can AI-Driven Chatbots Give Reliable Advice About Cancer?

Cancer Treatment Recommendations Study

Consumer Health Information Study

Expert Perspective

References

View the Latest Issue of TON

Subscribe Today!

I'd like to receive:

Can AI-Driven Chatbots Give Reliable Advice About Cancer?

Cancer Treatment Recommendations Study

Consumer Health Information Study

Expert Perspective

References

Related Items

View the Latest Issue of TON

Subscribe Today!

I'd like to receive:

We Value Your Feedback! Tell Us How You Consume Content