Medical Reasoning with Large Language Models: A Survey and MR-Bench
March 2026
in “
ArXiv.org
”
This document reviews the use of large language models (LLMs) in medical reasoning, emphasizing the need for robust reasoning beyond factual recall in clinical settings. It conceptualizes medical reasoning as involving abduction, deduction, and induction, and categorizes existing methods into seven technical routes. A cross-benchmark evaluation of medical reasoning models is conducted, revealing a significant gap between exam-level performance and real-world clinical decision-making accuracy. The introduction of MR-Bench, a benchmark based on real hospital data, further highlights this discrepancy. The survey provides a comprehensive overview of current methods, benchmarks, and evaluation practices, identifying key gaps in LLM performance for clinical reasoning.