Reasoning or knowledge: Stratified evaluation of biomedical LLMs

Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison G Zhang, Angela Zhang, Eric Wu, Haotian Ye, James Zou · 2026 · DOI 10.18653/v1/2026.eacl-long.111

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

ISOSCI benchmark finds 91.3% of reasoning-mode accuracy gains in LLMs on science problems depend on domain knowledge rather than invariant logical structure.

citing papers explorer

Showing 1 of 1 citing paper.

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs cs.CL · 2026-07-01 · unverdicted · none · ref 21
ISOSCI benchmark finds 91.3% of reasoning-mode accuracy gains in LLMs on science problems depend on domain knowledge rather than invariant logical structure.

Reasoning or knowledge: Stratified evaluation of biomedical LLMs

fields

years

verdicts

representative citing papers

citing papers explorer