Evaluating Anti-LGBTQIA+ Medical Bias in Large Language Models
September 2025
in “
PLOS Digital Health
”
TLDR Large language models often give biased or inaccurate medical responses, especially for LGBTQIA+ prompts.
This study evaluated the potential of four large language models (LLMs) to propagate anti-LGBTQIA+ medical bias and misinformation in clinical settings. Using 38 prompts, both with and without LGBTQIA+ identity terms, the study assessed the appropriateness and clinical utility of LLM responses. Results showed that all models generated inappropriate responses, with 43-62% for LGBTQIA+ prompts and 47-65% for non-LGBTQIA+ prompts, primarily due to hallucination/accuracy issues, followed by bias or safety concerns. LGBTQIA+ prompts elicited more severe bias. The study suggests future work should focus on improving accuracy, reducing bias, and tailoring outputs for LGBTQIA+ patients. The prompts and responses are provided as a benchmark for future evaluations.