arxiv: 2604.02695 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

XrayClaw: Cooperative-Competitive Multi-Agent Alignment for Trustworthy Chest X-ray Diagnosis

Shawn Young , Lijian Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords multi-agent systemschest X-ray diagnosishallucination mitigationmedical imaging AIcooperative-competitive alignmentpreference optimizationzero-shot generalizationclinical reasoning

0 comments

The pith

XrayClaw uses four cooperative agents and one competitive auditor, reconciled by Competitive Preference Optimization, to reach state-of-the-art accuracy and lower hallucinations in chest X-ray diagnosis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces XrayClaw to fix logical inconsistencies and diagnostic hallucinations that arise in single-model AI systems for chest X-ray interpretation. It deploys four specialized cooperative agents to follow a clinical workflow and adds a separate competitive agent that audits the outputs. Competitive Preference Optimization then forces mutual verification between the different reasoning paths to penalize illogical steps. On the MS-CXR-T, MIMIC-CXR, and CheXbench benchmarks the system records top scores for diagnostic accuracy, reasoning fidelity, and zero-shot generalization to new domains while cutting cumulative hallucinations. A sympathetic reader would care because more reliable automated reads could support clinicians without adding new layers of AI error.

Core claim

XrayClaw operationalizes multi-agent alignment through a cooperative-competitive architecture that integrates four specialized cooperative agents simulating a systematic clinical workflow together with a competitive agent serving as an independent auditor; Competitive Preference Optimization reconciles the pathways by penalizing illogical reasoning through enforced mutual verification between analytical and holistic interpretations, producing state-of-the-art diagnostic accuracy, clinical reasoning fidelity, and zero-shot domain generalization on MS-CXR-T, MIMIC-CXR, and CheXbench while mitigating cumulative hallucinations.

What carries the argument

Cooperative-competitive architecture of four workflow agents plus one auditor agent, aligned by Competitive Preference Optimization that enforces mutual verification between distinct diagnostic pathways.

If this is right

Multi-agent systems can simulate collaborative clinical consultation more effectively than monolithic models for chest X-ray tasks.
Enforcing competition between analytical and holistic interpretations reduces consensus-based diagnostic errors.
The same alignment objective improves zero-shot performance when models encounter new imaging domains or equipment.
Cumulative hallucinations decline because each pathway must verify the other before final output.
The framework offers a scalable route to more trustworthy automated medical imaging analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same cooperative-competitive pattern could transfer to other imaging modalities such as CT or MRI to improve reliability.
Competitive Preference Optimization might generalize beyond medicine to reduce hallucinations in large language models used for factual reasoning.
Internal auditing agents could eventually integrate with human radiologists to create hybrid review loops that flag disagreements automatically.
If the approach scales, it suggests that multi-agent alignment can serve as a practical substitute for extensive human-labeled data in safety-critical domains.

Load-bearing premise

The cooperative-competitive architecture and Competitive Preference Optimization can reliably reconcile distinct diagnostic pathways and reduce hallucinations without introducing new systematic errors or biases on real clinical data.

What would settle it

A controlled evaluation on a large held-out set of real clinical chest X-rays in which XrayClaw produces more hallucinations or lower accuracy than strong single-model baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.02695 by Lijian Xu, Shawn Young.

**Figure 1.** Figure 1: The overarching architecture of XrayClaw. The framework orchestrates a cooperative-competitive alignment process for trustworthy Chest X-ray diagnosis. The Multi Specialized Agents Cooperation pipeline (top) decomposes the diagnostic task into four sequential stages: systematic scanning, targeted lesion analysis, differential reasoning, and structured report synthesis. In parallel, the Omni-Radiologist Age… view at source ↗

read the original abstract

Chest X-ray (CXR) interpretation is a fundamental yet complex clinical task that increasingly relies on artificial intelligence for automation. However, traditional monolithic models often lack the nuanced reasoning required for trustworthy diagnosis, frequently leading to logical inconsistencies and diagnostic hallucinations. While multi-agent systems offer a potential solution by simulating collaborative consultations, existing frameworks remain susceptible to consensus-based errors when instantiated by a single underlying model. This paper introduces XrayClaw, a novel framework that operationalizes multi-agent alignment through a sophisticated cooperative-competitive architecture. XrayClaw integrates four specialized cooperative agents to simulate a systematic clinical workflow, alongside a competitive agent that serves as an independent auditor. To reconcile these distinct diagnostic pathways, we propose Competitive Preference Optimization, a learning objective that penalizes illogical reasoning by enforcing mutual verification between analytical and holistic interpretations. Extensive empirical evaluations on the MS-CXR-T, MIMIC-CXR, and CheXbench benchmarks demonstrate that XrayClaw achieves state-of-the-art performance in diagnostic accuracy, clinical reasoning fidelity, and zero-shot domain generalization. Our results indicate that XrayClaw effectively mitigates cumulative hallucinations and enhances the overall reliability of automated CXR diagnosis, establishing a new paradigm for trustworthy medical imaging analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XrayClaw's cooperative-competitive setup for CXR diagnosis is a reasonable idea but rests on an unproven claim that the competitive auditor stays independent when everything runs on the same base model.

read the letter

The main thing here is a multi-agent framework that splits CXR reading into four cooperative agents following a clinical workflow and one competitive auditor that checks for inconsistencies, all tied together by a new objective called Competitive Preference Optimization. That combination is presented as new, and the abstract shows it hitting SOTA numbers on MS-CXR-T, MIMIC-CXR, and CheXbench for accuracy, reasoning quality, and zero-shot generalization while cutting hallucinations. The motivation is solid: single-model multi-agent systems often collapse into correlated errors, so adding an explicit adversary makes sense on the surface.

Referee Report

3 major / 1 minor

Summary. The paper introduces XrayClaw, a multi-agent framework for chest X-ray diagnosis that deploys four cooperative agents to simulate a clinical workflow alongside one competitive auditor agent, reconciled through a proposed Competitive Preference Optimization objective that penalizes illogical reasoning via mutual verification; extensive evaluations on MS-CXR-T, MIMIC-CXR, and CheXbench are reported to yield state-of-the-art diagnostic accuracy, clinical reasoning fidelity, and zero-shot generalization while reducing cumulative hallucinations.

Significance. If the central claims hold, the work would represent a meaningful advance in trustworthy medical AI by addressing consensus failures in single-model multi-agent systems and providing a concrete mechanism for reconciling analytical and holistic diagnostic pathways, with potential implications for reducing diagnostic errors in clinical imaging pipelines.

major comments (3)

[Abstract] Abstract: The claim that the competitive agent functions as an 'independent auditor' is load-bearing for all SOTA and hallucination-mitigation results, yet the architecture description indicates all agents (cooperative and competitive) are instantiated from the same base LLM; without explicit evidence that Competitive Preference Optimization breaks correlation in reasoning biases and hallucination modes, the mutual-verification mechanism risks circularity rather than genuine independence.
[Method] Method (Competitive Preference Optimization): The learning objective is introduced as penalizing illogical reasoning through enforcement of mutual verification, but no explicit loss function, derivation, or hyperparameter schedule is provided; this prevents verification that reported gains on diagnostic accuracy and zero-shot generalization are independent of the fitting process itself rather than artifacts of post-hoc tuning.
[Experiments] Experiments: The SOTA claims on MS-CXR-T, MIMIC-CXR, and CheXbench rest on the assumption that the cooperative-competitive setup reliably reduces hallucinations without introducing new systematic biases, yet no ablation studies isolating the competitive auditor, single-model vs. multi-model instantiations, or error analysis of residual hallucinations are referenced; this undermines the ability to attribute performance gains to the proposed architecture.

minor comments (1)

[Abstract] Abstract: The phrase 'cumulative hallucinations' is introduced without a concise definition or reference to prior usage, which reduces immediate clarity for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each major comment point-by-point below, providing clarifications where possible and committing to revisions that strengthen the paper without misrepresenting our contributions.

read point-by-point responses

Referee: [Abstract] The claim that the competitive agent functions as an 'independent auditor' is load-bearing for all SOTA and hallucination-mitigation results, yet the architecture description indicates all agents (cooperative and competitive) are instantiated from the same base LLM; without explicit evidence that Competitive Preference Optimization breaks correlation in reasoning biases and hallucination modes, the mutual-verification mechanism risks circularity rather than genuine independence.

Authors: We agree that shared base LLM instantiation raises a valid concern about potential bias correlation. However, the distinct role-specific system prompts combined with the Competitive Preference Optimization objective are designed to enforce divergence: the competitive agent is explicitly optimized to penalize agreement on illogical steps identified by the cooperative agents. This creates behavioral independence even from a common model. We will add a dedicated subsection in Methods explaining the decorrelation mechanism and include new analysis in Experiments quantifying reduced error-pattern correlation between agent types. revision: yes
Referee: [Method] The learning objective is introduced as penalizing illogical reasoning through enforcement of mutual verification, but no explicit loss function, derivation, or hyperparameter schedule is provided; this prevents verification that reported gains on diagnostic accuracy and zero-shot generalization are independent of the fitting process itself rather than artifacts of post-hoc tuning.

Authors: We apologize for this omission in the original submission. The Competitive Preference Optimization loss is L_CPO = -E[log σ(r_coop - r_comp)], where r denotes the mutual verification reward score derived from direct preference optimization adapted to the competitive setting. The penalty coefficient β is scheduled from 0.1 to 0.5 over 3 epochs with a fixed learning rate of 1e-5. We will insert the full mathematical derivation, pseudocode, and complete hyperparameter table into the revised Methods section to ensure reproducibility and allow independent verification of the gains. revision: yes
Referee: [Experiments] The SOTA claims on MS-CXR-T, MIMIC-CXR, and CheXbench rest on the assumption that the cooperative-competitive setup reliably reduces hallucinations without introducing new systematic biases, yet no ablation studies isolating the competitive auditor, single-model vs. multi-model instantiations, or error analysis of residual hallucinations are referenced; this undermines the ability to attribute performance gains to the proposed architecture.

Authors: We acknowledge that explicit ablations isolating the competitive auditor would strengthen attribution. While the manuscript already includes comparisons to single-agent and non-competitive multi-agent baselines, we did not report a dedicated removal of only the auditor or a full single-vs-multi-model breakdown. We will add these ablations (including error categorization of residual hallucinations) to the revised Experiments section. Where new runs are required, we will report them as additional results; partial coverage will be noted if compute limits prevent exhaustive multi-model variants. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The abstract introduces Competitive Preference Optimization as a proposed learning objective for reconciling pathways and penalizing illogical reasoning, but provides no equations, loss function, or derivation that reduces to fitted inputs or self-citations. No load-bearing step is shown to be equivalent to its own inputs by construction. The central claims rest on empirical results across benchmarks rather than a self-referential derivation. This is the most common honest finding for papers whose core contribution is an architectural proposal evaluated externally.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified premise that multi-agent cooperation plus an independent auditor plus a new preference objective will reduce hallucinations more effectively than prior single-model or simple ensemble methods; no independent evidence for this premise is supplied in the abstract.

axioms (1)

domain assumption Multi-agent systems instantiated from a single underlying model can still produce trustworthy consensus when augmented with a competitive auditor
Invoked to justify the overall architecture.

invented entities (1)

Competitive Preference Optimization no independent evidence
purpose: Reconcile cooperative and competitive diagnostic pathways by penalizing illogical reasoning
Newly proposed learning objective whose functional form is not shown

pith-pipeline@v0.9.0 · 5513 in / 1162 out tokens · 37970 ms · 2026-05-13T20:56:54.171982+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Competitive Preference Optimization (ComPO) ... penalizes illogical reasoning by enforcing mutual verification between analytical and holistic interpretations
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

four specialized cooperative agents ... competitive agent that serves as an independent auditor

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Autonomous Drift Learning in Data Streams: A Unified Perspective
cs.LG 2026-05 unverdicted novelty 7.0

A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Bannur, S., Hyland, S., Liu, Q., Perez-Garcia, F., Ilse, M., Castro, D.C., Boecking, B., Sharma, H., Bouzid, K., Thieme, A., et al.: Learning to exploit temporal struc- ture for biomedical vision-language processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15016–15027 (2023)

work page 2023
[3]

In: European conference on computer vision

Boecking, B., Usuyama, N., Bannur, S., Castro, D.C., Schwaighofer, A., Hyland, S., Wetscherek, M., Naumann, T., Nori, A., Alvarez-Valle, J., et al.: Making the most 12 Shawn Young and Lijian Xu of text semantics to improve biomedical vision–language processing. In: European conference on computer vision. pp. 1–21. Springer (2022)

work page 2022
[4]

arXiv preprint arXiv:2506.14142 (2025)

Chen, W., Dong, Y., Ding, Z., Shi, Y., Zhou, Y., Zeng, F., Luo, Y., Lin, T., Su, Y., Wu, Y., et al.: Radfabric: Agentic ai system with reasoning capability for radiology. arXiv preprint arXiv:2506.14142 (2025)

work page arXiv 2025
[5]

In: International Confer- enceonMedicalImageComputingandComputer-AssistedIntervention

Chen, Y., Xu, S., Sellergren, A., Matias, Y., Hassidim, A., Shetty, S., Golden, D., Yuille, A.L., Yang, L.: Coca-cxr: Co ntrastive ca ptioners learn strong temporal structures for chest x-ray vision-language understanding. In: International Confer- enceonMedicalImageComputingandComputer-AssistedIntervention. pp.78–88. Springer (2025)

work page 2025
[6]

In: AAAI 2024 Spring Symposium on Clinical Foundation Models (2024)

Chen, Z., Varma, M., Delbrouck, J.B., Paschali, M., Blankemeier, L., Van Veen, D., Valanarasu, J.M.J., Youssef, A., Cohen, J.P., Reis, E.P., et al.: Chexagent: Towards a foundation model for chest x-ray interpretation. In: AAAI 2024 Spring Symposium on Clinical Foundation Models (2024)

work page 2024
[7]

arXiv preprint arXiv:2603.01143 (2026)

Chen, Z., Young, S., Xu, L.: Tc-ssa: Token compression via semantic slot aggrega- tion for gigapixel pathology reasoning. arXiv preprint arXiv:2603.01143 (2026)

work page arXiv 2026
[8]

Journal of Computing Science and Engineering6(2), 168–177 (2012)

Demner-Fushman, D., Antani, S., Simpson, M., Thoma, G.R.: Design and de- velopment of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering6(2), 168–177 (2012)

work page 2012
[9]

PMLR (2025)

Fallahpour, A., Ma, J., Munim, A., Lyu, H., Wang, B.: Medrax: Medical reasoning agentforchestx-ray.In:InternationalConferenceonMachineLearning.pp.15661– 15676. PMLR (2025)

work page 2025
[10]

arXiv preprint arXiv:2603.07113 (2026)

Feng, W., Young, S., Xu, L.: Efficient chest x-ray representation learning via semantic-partitioned contrastive learning. arXiv preprint arXiv:2603.07113 (2026)

work page arXiv 2026
[11]

arXiv preprint arXiv:2603.07135 (2026)

He, L., Yang, X., Xu, L.: The model knows which tokens matter: automatic token selection via noise gating. arXiv preprint arXiv:2603.07135 (2026)

work page arXiv 2026
[12]

GPT-4o System Card

Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Os- trow, A., Welihinda, A., Hayes, A., Radford, A., et al.: Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Jin, H., Che, H., Lin, Y., Chen, H.: Promptmrg: Diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 2607–2615 (2024)

work page 2024
[14]

Scientific data6(1), 317 (2019)

Johnson, A.E., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data6(1), 317 (2019)

work page 2019
[15]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Karwande, G., Mbakwe, A.B., Wu, J.T., Celi, L.A., Moradi, M., Lourentzou, I.: Chexrelnet: An anatomy-aware model for tracking longitudinal relationships be- tween chest x-rays. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 581–591. Springer (2022)

work page 2022
[16]

Advances in Neural Information Processing Systems36, 28541–28564 (2023)

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems36, 28541–28564 (2023)

work page 2023
[17]

Li, M., Lin, B., Chen, Z., Lin, H., Liang, X., Chang, X.: Dynamic graph en- hancedcontrastivelearningforchestx-rayreportgeneration.In:Proceedingsofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3334– 3343 (2023)

work page 2023
[18]

Liu, B., Zhan, L.M., Xu, L., Ma, L., Yang, Y., Wu, X.M.: Slake: A semantically- labeledknowledge-enhanceddatasetformedicalvisualquestionanswering.In:2021 XrayClaw 13 IEEE 18th international symposium on biomedical imaging (ISBI). pp. 1650–1654. IEEE (2021)

work page 2021
[19]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Liu, C., Tian, Y., Chen, W., Song, Y., Zhang, Y.: Bootstrapping large language models for radiology report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 18635–18643 (2024)

work page 2024
[20]

In: International conference on medical image computing and computer-assisted intervention

Pellegrini, C., Keicher, M., Özsoy, E., Navab, N.: Rad-restruct: A novel vqa bench- mark and method for structured radiology reporting. In: International conference on medical image computing and computer-assisted intervention. pp. 409–419. Springer (2023)

work page 2023
[21]

Advances in neural information processing systems36, 53728–53741 (2023)

Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023)

work page 2023
[22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Wang, X., Wang, F., Li, Y., Ma, Q., Wang, S., Jiang, B., Tang, J.: Cxpmrg-bench: Pre-training and benchmarking for x-ray medical report generation on chexpert plus dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5123–5133 (June 2025)

work page 2025
[23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, Z., Liu, L., Wang, L., Zhou, L.: Metransformer: Radiology report genera- tion by transformer with multiple learnable expert tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11558– 11567 (2023)

work page 2023
[24]

Meta-Radiology1(3), 100033 (2023)

Wang, Z., Liu, L., Wang, L., Zhou, L.: R2gengpt: Radiology report generation with frozen llms. Meta-Radiology1(3), 100033 (2023)

work page 2023
[25]

IEEE Transactions on Pattern Anal- ysis and Machine Intelligence47(8), 6585–6598 (Aug 2025)

Wang, Z., Wang, L., Li, X., Zhou, L.: Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence47(8), 6585–6598 (Aug 2025)

work page 2025
[26]

In: European Conference on Computer Vision

Wei, H., Qiu, J., Yu, H., Yuan, W.: Medco: Medical education copilots based on a multi-agent framework. In: European Conference on Computer Vision. pp. 119–

work page
[27]

arXiv preprint arXiv:2311.01092 (2023)

Xu, L., Ni, Z., Liu, X., Wang, X., Li, H., Zhang, S.: Learning a multi-task trans- former via unified and customized instruction tuning for chest radiograph inter- pretation. arXiv preprint arXiv:2311.01092 (2023)

work page arXiv 2023
[28]

arXiv preprint arXiv:2410.08861 (2024)

Xu, L., Ni, Z., Sun, H., Li, H., Zhang, S.: A foundation model for generalizable disease diagnosis in chest x-ray images. arXiv preprint arXiv:2410.08861 (2024)

work page arXiv 2024
[29]

arXiv preprint arXiv:2409.19684 (2024)

Xu, L., Sun, H., Ni, Z., Li, H., Zhang, S.: Medvilam: A multimodal large lan- guage model with advanced generalizability and explainability for medical data understanding and generation. arXiv preprint arXiv:2409.19684 (2024)

work page arXiv 2024
[30]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Yan, B., Pei, M.: Clinical-bert: Vision-language pre-training for radiograph diag- nosis and reports generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 2982–2990 (2022)

work page 2022
[31]

In: Forty-first International Conference on Machine Learning (2024)

Yang, J., Su, B., Zhao, X., Wen, J.R.: Unlocking the power of spatial and tem- poral information in medical multimodal pre-training. In: Forty-first International Conference on Machine Learning (2024)

work page 2024
[32]

Medical image analysis80, 102510 (2022)

Yang, S., Wu, X., Ge, S., Zhou, S.K., Xiao, L.: Knowledge matters: Chest radiology report generation with general and specific knowledge. Medical image analysis80, 102510 (2022)

work page 2022
[33]

Applied Intelligence 52(8), 8746–8756 (2022)

Yang, X., Chen, Y., Yue, X., Ma, C., Yang, P.: Local linear embedding based in- terpolation neural network in pancreatic tumor segmentation. Applied Intelligence 52(8), 8746–8756 (2022)

work page 2022
[34]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Yang, X., Chen, Y., Yue, X., Xu, S., Ma, C.: T-distributed spherical feature rep- resentation for imbalanced classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 10825–10833 (2023) 14 Shawn Young and Lijian Xu

work page 2023
[35]

In: The Thirteenth International Conference on Learning Representations (2025)

Yang, X., Lu, J., Yu, E.: Adapting multi-modal large language model to concept drift from pre-training onwards. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025
[36]

Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Multi-Stream Environments

Yang, X., Lu, J., Yu, E.: Learning from all: Concept alignment for autonomous distillation from multiple drifting mllms. arXiv preprint arXiv:2510.04142 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Yang, X., Lu, J., Yu, E.: Walking the tightrope: Autonomous disentangling benefi- cial and detrimental drifts in non-stationary custom-tuning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

work page 2025
[38]

arXiv preprint arXiv:2502.07620 (2025)

Yang, X., Lu, J., Yu, E., Duan, W.: Resilient contrastive pre-training under non- stationary drift. arXiv preprint arXiv:2502.07620 (2025)

work page arXiv 2025
[39]

In: Forty-second International Conference on Machine Learning (2025)

Yang, X., Xu, L., Li, H., Zhang, S.: One leaf reveals the season: Occlusion-based contrastive learning with semantic-aware views for efficient visual representation. In: Forty-second International Conference on Machine Learning (2025)

work page 2025
[40]

IEEE Transactions on Medical Imaging44(1), 259–269 (2024)

Yang, X., Xu, L., Yu, S., Xia, Q., Li, H., Zhang, S.: Segmentation and vascular vec- torization for coronary artery by geometry-based cascaded neural network. IEEE Transactions on Medical Imaging44(1), 259–269 (2024)

work page 2024
[41]

Pattern Recognition p

Yang, X., Xu, L., Zeng, X., Wang, X., Li, H., Zhang, S.: Scalar: Spatial-concept alignment for robust vision in harsh open world. Pattern Recognition p. 113203 (2026)

work page 2026
[42]

IEEE Transactions on Medical Imaging (2025)

Yang, Y., You, X., Zhang, K., Fu, Z., Wang, X., Ding, J., Sun, J., Yu, Z., Huang, Q., Han, W., et al.: Spatio-temporal and retrieval-augmented modelling for chest x-ray report generation. IEEE Transactions on Medical Imaging (2025)

work page 2025
[43]

In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Yang,Z.,Shen,L.:Tempa-vlp:Temporal-awarevision-languagepretrainingforlon- gitudinal exploration in chest x-ray image. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 4625–4634 (2025)

work page 2025
[44]

Young,S.,Zeng,X.,Xu,L.:Fewertokens,greaterscaling:Self-adaptivevisualbases forefficientandexpansiverepresentationlearning.arXivpreprintarXiv:2511.19515 (2025)

work page arXiv 2025
[45]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Yu, E., Lu, J., Wang, K., Yang, X., Zhang, G.: Drift-aware collaborative assistance mixture of experts for heterogeneous multistream learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 16199–16207 (2026)

work page 2026
[46]

In: Findings of the Association for Computational Linguistics: ACL 2025

Zhou,Y.,Song,L.,Shen,J.:Mam:Modularmulti-agentframeworkformulti-modal medical diagnosis via role-specialized collaboration. In: Findings of the Association for Computational Linguistics: ACL 2025. pp. 25319–25333 (2025)

work page 2025