{"paper":{"title":"RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A tool-using AI agent generates more accurate, robust, and faithful chest CT reports by producing explicit stepwise reasoning traces.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Benjamin Gundersen, Bjoern Menze, Christian Bluethgen, Farhad Nooralahzadeh, Jean-Benoit Delbrouck, Jiwoong Sohn, Julia E. Vogt, Kenneth Styppa, M\\'elanie Roschewitz, Michael Krauthammer, Michael Moor, Nicolas Deperrois, Yitian Tao","submitted_at":"2026-04-16T17:09:30Z","abstract_excerpt":"Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine. To address this, we introduce RadAgent, a tool-using AI agent that generates CT reports through a stepwise and interpretable process. Each resulting report is accompanied by a fully inspectable trace of intermediate decisions and tool interactions, allowing clinicians to ex"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"RadAgent improves Chest CT report generation over its 3D VLM counterpart, CT-Chat, across three dimensions. Clinical accuracy improves by 6.0 points (36.4% relative) in macro-F1 and 5.4 points (19.6% relative) in micro-F1. Robustness under adversarial conditions improves by 24.7 points (41.9% relative). Furthermore, RadAgent achieves 37.0% in faithfulness, a new capability entirely absent in its 3D VLM counterpart.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the observed gains in accuracy, robustness, and the new faithfulness metric are attributable to the tool-using stepwise agent architecture rather than differences in training data, model size, or evaluation choices.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"RadAgent generates stepwise, tool-augmented chest CT reports with traceable decisions, improving accuracy, robustness, and adding a 37% faithfulness score absent in standard 3D VLMs.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A tool-using AI agent generates more accurate, robust, and faithful chest CT reports by producing explicit stepwise reasoning traces.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"25bc2fb8ca2ef08e59caeb1aed850ac8b31484d4f5abbcc49704de0f5e535e70"},"source":{"id":"2604.15231","kind":"arxiv","version":2},"verdict":{"id":"06dad0bb-adf7-480b-a88c-5927a3a27dea","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T10:50:55.404179Z","strongest_claim":"RadAgent improves Chest CT report generation over its 3D VLM counterpart, CT-Chat, across three dimensions. Clinical accuracy improves by 6.0 points (36.4% relative) in macro-F1 and 5.4 points (19.6% relative) in micro-F1. Robustness under adversarial conditions improves by 24.7 points (41.9% relative). Furthermore, RadAgent achieves 37.0% in faithfulness, a new capability entirely absent in its 3D VLM counterpart.","one_line_summary":"RadAgent generates stepwise, tool-augmented chest CT reports with traceable decisions, improving accuracy, robustness, and adding a 37% faithfulness score absent in standard 3D VLMs.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the observed gains in accuracy, robustness, and the new faithfulness metric are attributable to the tool-using stepwise agent architecture rather than differences in training data, model size, or evaluation choices.","pith_extraction_headline":"A tool-using AI agent generates more accurate, robust, and faithful chest CT reports by producing explicit stepwise reasoning traces."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.15231/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}