A Comparative Analysis of Large Language Model Accuracy for Image-Based Hair Disease Identification in Diverse Skin Tones

August 2024 in “ Journal of the National Medical Association ”

Willow Pastard, Willow Pastard, Zane Sejdiu, Alexis Arza, James Cross, Razmig Garabet, Anna Chacon, Ellen Pritchett

alopecia areata androgenetic alopecia traction alopecia central centrifugal cicatricial alopecia Monk Skin Tone Scale

TLDR ChatGPT is more accurate at diagnosing hair disorders in lighter skin tones than darker ones.

This study evaluates the accuracy of the large language model ChatGPT in diagnosing hair disorders such as alopecia areata, androgenetic alopecia, traction alopecia, and central centrifugal cicatricial alopecia across different skin tones using the Monk Skin Tone Scale. The results indicate that ChatGPT is more accurate in diagnosing these conditions in lighter skin tones, with significant accuracy for alopecia areata (p<.001) and androgenetic alopecia (p=.003). However, it frequently misidentified 24.48% of hair conditions in darker skin as traction alopecia. The study underscores the limitations of AI models trained on dermatologic databases that lack diverse representation, affecting their diagnostic performance across varied skin tones.

View this study on dx.doi.org →

Discuss this study in the Community →