Comparison of large language models in generating patient handouts for the dermatology clinic: A blinded study

February 2024 in “ JAAD International ”

Crystal Chang, Iesha L Ticknor, Jacob-Anthony Spinelli, Bhavnit K. Bhatia, Sangeeta Marwaha, Paradi Mirmirani, Anne M. Seidler, Jeremy R Man, Patrick E. McCleskey

TLDR ChatGPT is preferred for creating dermatology patient handouts, but all models can be useful with oversight.

This study evaluated the effectiveness of large language models (LLMs) like ChatGPT, Bard, and BingAI in generating patient handouts for dermatology topics, including dermatitis, alopecia, and dyspigmentation. ChatGPT was most preferred by dermatologists, ranking first in 46.3% of cases, particularly for dermatitis and alopecia, while BingAI was slightly favored for dyspigmentation. ChatGPT and BingAI outperformed Bard in understandability based on the Patient Education Materials Assessment Tool (PEMAT). BingAI achieved the closest approximation to a sixth-grade reading level using the Simple Measure of Gobbledygook (SMOG). Despite some limitations, such as a small number of reviewers, the study suggests that LLMs can produce understandable and accurate handouts with oversight, highlighting the potential for these models in clinical settings.

View this study on jaadinternational.org →