Automated Monkeypox Identification from Electronic Medical Records Using Large Language Models — Shenzhen City, Guangdong Province, China, 2023–2025

January 2026 in “ China CDC Weekly ”

Diyang Xue, Ye Ye, Yu Wu, Zhen Zhang, J H Lu, Xuan Zou, Qiuying Lv

TLDR Large language models can accurately identify monkeypox from medical records.

The study explored the use of large language models (LLMs) for automated identification of monkeypox (mpox) from electronic medical records (EMRs) in Shenzhen, China, involving 239 individuals (126 mpox cases and 113 controls). The DeepSeek-R1-14B model was used to extract clinical features from free-text data, outperforming traditional methods and achieving high accuracy (96.1%) in identifying symptoms like fever and rash. Logistic regression based on these features showed the best performance with an AUROC of 0.927 and accuracy of 87.5%. The study concludes that LLM-based extraction of clinical features from EMRs is a promising approach for early mpox case identification, supporting intelligent surveillance and early warning systems.

View this study on weekly.chinacdc.cn →

Discuss this study in the Community →