Neuro-symbolic pipeline using multi-agent translation and SAT solving detects conflicts in multimorbidity guidelines with 0.861 F1, finding 90.6% are local conflicts on 12 SGLT2 guidelines.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.
citing papers explorer
-
Neuro-Symbolic Resolution of Recommendation Conflicts in Multimorbidity Clinical Guidelines
Neuro-symbolic pipeline using multi-agent translation and SAT solving detects conflicts in multimorbidity guidelines with 0.861 F1, finding 90.6% are local conflicts on 12 SGLT2 guidelines.
-
CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning
CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.