A Comparative Analysis of Large Language Model Accuracy for Image-Based Hair Disease Identification in Diverse Skin Tones
August 2024
in “
Journal of the National Medical Association
”
TLDR ChatGPT is more accurate at diagnosing hair disorders in lighter skin tones than darker ones.
This study evaluates the accuracy of the large language model ChatGPT in diagnosing hair disorders such as alopecia areata, androgenetic alopecia, traction alopecia, and central centrifugal cicatricial alopecia across different skin tones using the Monk Skin Tone Scale. The results indicate that ChatGPT is more accurate in diagnosing these conditions in lighter skin tones, with significant accuracy for alopecia areata (p<.001) and androgenetic alopecia (p=.003). However, it frequently misidentified 24.48% of hair conditions in darker skin as traction alopecia. The study underscores the limitations of AI models trained on dermatologic databases that lack diverse representation, affecting their diagnostic performance across varied skin tones.