Medical Reasoning with Large Language Models: A Survey and MR-Bench

    March 2026 in “ ArXiv.org
    Xiaohan Ren, Chenxiao Fan, Wenyin Ma, Hongliang He, Chongming Gao, Xiaoyan Zhao, Fuli Feng
    This document reviews the use of large language models (LLMs) in medical reasoning, emphasizing the need for robust reasoning beyond factual recall in clinical settings. It conceptualizes medical reasoning as involving abduction, deduction, and induction, and categorizes existing methods into seven technical routes. A cross-benchmark evaluation of medical reasoning models is conducted, revealing a significant gap between exam-level performance and real-world clinical decision-making accuracy. The introduction of MR-Bench, a benchmark based on real hospital data, further highlights this discrepancy. The survey provides a comprehensive overview of current methods, benchmarks, and evaluation practices, identifying key gaps in LLM performance for clinical reasoning.
    Discuss this study in the Community →