arxiv: 2605.14403 · v1 · submitted 2026-05-14 · 💻 cs.CV

Recognition: no theorem link

DermAgent: A Self-Reflective Agentic System for Dermatological Image Analysis with Multi-Tool Reasoning and Traceable Decision-Making

Yize Liu , Siyuan Yan , Ming Hu , Lie Ju , Xieji Li , Feilong Tang , Wei Feng , Zongyuan Ge

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords dermatological image analysismulti-tool agentself-reflective reasoningretrieval augmentationhallucination mitigationzero-shot diagnosismedical vision-language modeltraceable decision making

0 comments

The pith

DermAgent anchors each skin image prediction in retrieved cases and guidelines then self-corrects via critic gates to raise diagnostic accuracy above standard models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DermAgent as a multi-tool agent that follows a Plan-Execute-Reflect cycle to analyze dermatological images through specialized vision and language modules. It retrieves supporting evidence by cross-referencing a large collection of diagnosed cases with clinical guideline chunks for every step of reasoning. A separate critic module then applies fixed gates on confidence, coverage, and source conflicts to catch and fix errors before final output. The design targets the common failures of insufficient medical grounding and unchecked hallucinations in current multimodal models. Experiments across five benchmarks show gains in zero-shot disease diagnosis, concept annotation, and clinical captioning.

Core claim

DermAgent orchestrates seven vision and language tools inside a Plan-Execute-Reflect framework, anchors every prediction through dual-modality retrieval from 413210 diagnosed image cases and 3199 guideline chunks, and applies a deterministic critic with confidence-coverage-conflict gates to detect disagreements and trigger self-correction, yielding higher zero-shot performance on fine-grained dermatology tasks than existing multimodal models.

What carries the argument

Dual-modality retrieval module that cross-references image cases and guideline chunks, combined with the critic module's three deterministic gates inside the Plan-Execute-Reflect cycle.

If this is right

Produces step-by-step traceable reasoning paths suitable for clinical review.
Raises zero-shot fine-grained disease diagnosis accuracy above current multimodal baselines.
Improves concept annotation and clinical captioning quality on dermatology benchmarks.
Reduces hallucinations by enforcing post-hoc checks across visual and textual sources.
Operates without task-specific fine-tuning on the tested benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit retrieval and auditing steps could ease regulatory review for medical decision support tools.
Similar retrieval-plus-critic structures may transfer to other narrow medical imaging domains such as pathology slides.
Continuous addition of new diagnosed cases to the retrieval store would likely extend coverage to emerging conditions.
The traceable outputs could support joint human-AI workflows where a clinician inspects the cited evidence and corrections.

Load-bearing premise

The retrieval database supplies complete and unbiased evidence for any input image while the critic gates detect real errors without rejecting correct answers or introducing new ones.

What would settle it

Run the system on a fresh collection of images from rare skin conditions absent from the 413210-case database and measure whether diagnostic accuracy falls to or below the level of unaided multimodal models.

Figures

Figures reproduced from arXiv: 2605.14403 by Feilong Tang, Lie Ju, Ming Hu, Siyuan Yan, Wei Feng, Xieji Li, Yize Liu, Zongyuan Ge.

**Figure 1.** Figure 1: Overview of the proposed DermAgent framework. An LLM controller orchestrates specialized visual perception and knowledge retrieval tools via an iterative Plan– Execute–Reflect loop. A deterministic Critic module further audits the accumulated evidence chain to trigger targeted self-correction. sign a deterministic Critic module. This module performs post-hoc auditing of the assembled evidence chain and di… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison on a representative case in the captioning task. Green highlights correct descriptions; red highlights hallucinated diagnoses. ward common classes through critic-driven self-correction. Initially, DermoGPT accurately describes the morphological features of the lesion but hallucinates a diagnosis of eczema. PanDerm similarly predicts eczema. However, the Case RAG module retrieves four… view at source ↗

read the original abstract

Dermatological diagnosis requires integrating fine-grained visual perception with expert clinical knowledge. Although Multimodal Large Language Models (MLLMs) facilitate interactive medical image analysis, their application in dermatology is hindered by insufficient domain-specific grounding and hallucinations. To address these issues, we propose DermAgent, a collaborative multi-tool agent that orchestrates seven specialized vision and language modules within a Plan-Execute-Reflect framework. DermAgent delivers stepwise, traceable diagnostic reasoning through three core components. First, it employs complementary visual perception tools for comprehensive morphological description, dermoscopic concept annotation, and disease diagnosis. Second, to overcome the lack of domain prior, a dual-modality retrieval module anchors every prediction in external evidence by cross-referencing 413,210 diagnosed image cases and 3,199 clinical guideline chunks. To further mitigate hallucinations, a deterministic critic module conducts strict post-hoc auditing via confidence, coverage, and conflict gates, automatically detecting inter-source disagreements to trigger targeted self-correction. Extensive experiments on five dermatology benchmarks demonstrate that DermAgent consistently outperforms state-of-the-art MLLMs and medical agent baselines across zero-shot fine-grained disease diagnosis, concept annotation, and clinical captioning tasks, exceeding GPT-4o by 17.6% in skin disease diagnostic accuracy and 3.15% in captioning ROUGE-L. Our code is available at https://github.com/YizeezLiu/DermAgent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DermAgent wires seven vision tools, large-scale dual-modality retrieval, and a gated critic into a Plan-Execute-Reflect loop, delivering measurable gains over GPT-4o on derm tasks, but the abstract supplies no experimental controls so the source of those gains stays unclear.

read the letter

DermAgent combines seven specialized vision and language modules, retrieval over 413k diagnosed cases plus guideline chunks, and a deterministic critic that checks confidence, coverage, and conflicts inside a Plan-Execute-Reflect loop. The traceable output and external anchoring are the concrete additions that prior MLLM or agent papers do not combine in this way for dermatology.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces DermAgent, a collaborative multi-tool agent for dermatological image analysis that orchestrates seven specialized vision and language modules within a Plan-Execute-Reflect framework. Core components include complementary visual perception tools, a dual-modality retrieval module anchoring predictions in 413,210 diagnosed image cases and 3,199 clinical guideline chunks, and a deterministic critic module using confidence-coverage-conflict gates for post-hoc auditing and self-correction. Experiments on five dermatology benchmarks claim consistent outperformance over state-of-the-art MLLMs and medical agent baselines, exceeding GPT-4o by 17.6% in skin disease diagnostic accuracy and 3.15% in captioning ROUGE-L, with code released at the provided GitHub link.

Significance. If the performance gains prove robust after verification of experimental protocols and absence of retrieval-test overlap, the work would advance agentic systems in medical imaging by demonstrating traceable, externally grounded reasoning that mitigates hallucinations in fine-grained dermatological tasks.

major comments (3)

[Abstract] Abstract: the central performance claims report consistent outperformance on five benchmarks and specific lifts over GPT-4o but supply no details on experimental protocols, baseline implementations, statistical testing, data splits, or evaluation metrics; these omissions render the claims unverifiable from the provided text.
[Abstract] Abstract and Experiments section: the dual-modality retrieval from 413,210 diagnosed cases is asserted to supply unbiased external evidence, yet no analysis demonstrates that the corpus does not overlap with any of the five benchmark test sets; overlap would allow direct case retrieval to explain the 17.6% accuracy and 3.15% ROUGE-L gains rather than the Plan-Execute-Reflect plus critic pipeline.
[Abstract] Abstract: the critic module's confidence-coverage-conflict gates are claimed to detect inter-source disagreements and trigger effective self-correction, but no quantitative breakdown of correction success rate versus introduced errors is supplied, leaving the net contribution of the self-reflective loop unverified.

minor comments (1)

[Abstract] Abstract: the description states seven specialized modules but enumerates visual perception tools, retrieval, and critic without an explicit breakdown of the seven; add a clarifying list or diagram reference.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments highlight important aspects of verifiability and experimental rigor that we will address in the revision. Below we respond to each major comment point by point, indicating where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims report consistent outperformance on five benchmarks and specific lifts over GPT-4o but supply no details on experimental protocols, baseline implementations, statistical testing, data splits, or evaluation metrics; these omissions render the claims unverifiable from the provided text.

Authors: We agree that the abstract, constrained by length, omits key experimental details. The Experiments section already specifies the five benchmarks (with sources and splits), zero-shot protocol, baseline re-implementations, metrics (accuracy, ROUGE-L, etc.), and statistical testing via paired t-tests. In the revised manuscript we will expand the abstract with a concise sentence summarizing the evaluation setup and metrics to improve immediate verifiability without exceeding typical abstract limits. revision: partial
Referee: [Abstract] Abstract and Experiments section: the dual-modality retrieval from 413,210 diagnosed cases is asserted to supply unbiased external evidence, yet no analysis demonstrates that the corpus does not overlap with any of the five benchmark test sets; overlap would allow direct case retrieval to explain the 17.6% accuracy and 3.15% ROUGE-L gains rather than the Plan-Execute-Reflect plus critic pipeline.

Authors: This concern is valid and we have addressed it internally by sourcing the 413k retrieval cases exclusively from datasets and collections distinct from the test splits of the five benchmarks (ISIC, HAM10000, Derm7pt, etc.), with explicit deduplication steps applied. We will add a new subsection in the revised Experiments section that details corpus construction, lists the exact sources, and reports the overlap verification procedure (including hash-based and metadata checks) to rule out leakage as the source of gains. revision: yes
Referee: [Abstract] Abstract: the critic module's confidence-coverage-conflict gates are claimed to detect inter-source disagreements and trigger effective self-correction, but no quantitative breakdown of correction success rate versus introduced errors is supplied, leaving the net contribution of the self-reflective loop unverified.

Authors: We acknowledge the value of quantitative evidence for the critic. The manuscript currently provides only qualitative examples in the appendix. In the revision we will insert a new table reporting aggregate statistics: number of triggered corrections, success rate (accuracy improvement post-correction), rate of introduced errors, and an ablation comparing full DermAgent against the version without the critic module. This will directly quantify the net contribution of the self-reflective loop. revision: yes

Circularity Check

0 steps flagged

No significant circularity; performance claims rest on external retrieval and empirical benchmarks rather than self-defined quantities

full rationale

The paper presents DermAgent as an agentic architecture that orchestrates external tools (visual perception modules, dual-modality retrieval over 413210 diagnosed cases plus 3199 guideline chunks, and a deterministic critic with confidence-coverage-conflict gates) inside a Plan-Execute-Reflect loop. No equations, fitted parameters, or self-citations are shown that reduce the reported accuracy or ROUGE-L gains to quantities defined by the system's own inputs. The 17.6% and 3.15% improvements are stated as outcomes of experiments on five dermatology benchmarks; the retrieval corpus and critic are described as external anchors rather than internal redefinitions of the target metrics. Because the central claims do not collapse by construction to fitted constants or self-referential definitions, the derivation chain remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the representativeness of the 413k-case retrieval corpus and the effectiveness of the critic gates; both are domain assumptions introduced to solve hallucination without independent external validation beyond the reported benchmarks.

axioms (2)

domain assumption The 413,210 diagnosed image cases and 3,199 guideline chunks form an unbiased and sufficiently comprehensive knowledge base for anchoring all dermatological predictions.
Invoked to ground every prediction; no discussion of selection bias or coverage gaps appears in the abstract.
ad hoc to paper The critic module's confidence, coverage, and conflict gates can detect inter-source disagreements and trigger effective self-correction.
Presented as the mechanism to mitigate hallucinations; no formal characterization or ablation of gate behavior is supplied.

invented entities (1)

DermAgent collaborative multi-tool agent no independent evidence
purpose: Orchestrate seven specialized vision and language modules with traceable reasoning
New system proposed to address domain grounding and hallucination; independent evidence limited to the five-benchmark experiments described.

pith-pipeline@v0.9.0 · 5583 in / 1501 out tokens · 34786 ms · 2026-05-15T02:52:42.920869+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 4 internal anchors

[1]

https://dermnetnz.org/

DermNet. https://dermnetnz.org/

work page
[2]

https://www.mayoclinic.org/diseases-conditions

Mayo Clinic - Medical Diseases & Conditions. https://www.mayoclinic.org/diseases-conditions

work page
[3]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X. et al.: Qwen3-VL Technical Report (Nov 2025). https://doi.org/10.48550/arXiv.2511.21631

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.21631 2025
[4]

et al.: HuatuoGPT-Vision, To- wards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale (Sep 2024)

Chen, J., Gui, C., Ouyang, R., Gao, A., Chen, S. et al.: HuatuoGPT-Vision, To- wards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale (Sep 2024). https://doi.org/10.48550/arXiv.2406.19280

work page doi:10.48550/arxiv.2406.19280 2024
[5]

https://doi.org/10.48550/arXiv.2302.00785

Daneshjou, R., Yuksekgonul, M., Cai, Z.R., Novoa, R., Zou, J.: SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained model de- bugging and analysis (Feb 2023). https://doi.org/10.48550/arXiv.2302.00785

work page doi:10.48550/arxiv.2302.00785 2023
[6]

et al.: Dermatologist- level classification of skin cancer with deep neural networks

Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M. et al.: Dermatologist- level classification of skin cancer with deep neural networks. Nature542(7639), 115–118 (Feb 2017). https://doi.org/10.1038/nature21056

work page doi:10.1038/nature21056 2017
[7]

et al.: De- velopment and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology

Ferber, D., El Nahhas, O.S., Wölflein, G., Wiest, I.C., Clusmann, J. et al.: De- velopment and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nature cancer pp. 1–13 (2025)

work page 2025
[8]

et al.: Man against machine: Diagnostic performance of a deep learning convo- lutional neural network for dermoscopic melanoma recognition in compari- son to 58 dermatologists

Haenssle, H.A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T. et al.: Man against machine: Diagnostic performance of a deep learning convo- lutional neural network for dermoscopic melanoma recognition in compari- son to 58 dermatologists. Annals of Oncology29(8), 1836–1842 (Aug 2018). https://doi.org/10.1093/annonc/mdy166

work page doi:10.1093/annonc/mdy166 2018
[9]

et al.: Evaluation and mitigation of the limitations of large language models in clinical decision-making

Hager, P., Jungmann, F., Holland, R., Bhagat, K., Hubrecht, I. et al.: Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nature Medicine30(9), 2613–2622 (Sep 2024). https://doi.org/10.1038/s41591-024-03097-1

work page doi:10.1038/s41591-024-03097-1 2024
[10]

et al.: Patients’ and dermatologists’ preferences in artificial intelligence- driven skin cancer diagnostics: A prospective multicentric survey study

Haggenmüller, S., Maron, R.C., Hekler, A., Krieghoff-Henning, E., Utikal, J.S. et al.: Patients’ and dermatologists’ preferences in artificial intelligence- driven skin cancer diagnostics: A prospective multicentric survey study. Jour- nal of the American Academy of Dermatology91(2), 366–370 (Aug 2024). https://doi.org/10.1016/j.jaad.2024.04.033

work page doi:10.1016/j.jaad.2024.04.033 2024
[11]

https://doi.org/10.6084/m9.figshare.6454973.v12

Han, S.S.: SNU dataset + Quiz (3 2019). https://doi.org/10.6084/m9.figshare.6454973.v12

work page doi:10.6084/m9.figshare.6454973.v12 2019
[12]

arXiv preprint arXiv:2510.08668 (2025)

Jiang, S., Wang, Y., Song, S., Hu, T., Zhou, C. et al.: Hulu-med: A transparent generalist model towards holistic medical vision-language understanding (2025), https://arxiv.org/abs/2510.08668

work page arXiv 2025
[13]

IEEE Journal of Biomedical and Health Informatics23(2), 538–546 (Mar 2019)

Kawahara, J., Daneshvar, S., Argenziano, G., Hamarneh, G.: Seven-Point Check- list and Skin Lesion Classification Using Multitask Multimodal Neural Nets. IEEE Journal of Biomedical and Health Informatics23(2), 538–546 (Mar 2019). https://doi.org/10.1109/JBHI.2018.2824327

work page doi:10.1109/jbhi.2018.2824327 2019
[14]

et al.: Trans- parent medical image AI via an image–text foundation model grounded in medical literature

Kim, C., Gadgil, S.U., DeGrave, A.J., Omiye, J.A., Cai, Z.R. et al.: Trans- parent medical image AI via an image–text foundation model grounded in medical literature. Nature Medicine30(4), 1154–1165 (Apr 2024). https://doi.org/10.1038/s41591-024-02887-x 10 Y. Liu et al

work page doi:10.1038/s41591-024-02887-x 2024
[15]

et al.: MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (Oct 2024)

Kim, Y., Park, C., Jeong, H., Chan, Y.S., Xu, X. et al.: MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (Oct 2024). https://doi.org/10.48550/arXiv.2404.15155

work page doi:10.48550/arxiv.2404.15155 2024
[16]

Llava-med: Training a large language-and- vision assistant for biomedicine in one day.arXiv preprint arXiv:2306.00890, 2023

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H. et al.: LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day (Jun 2023). https://doi.org/10.48550/arXiv.2306.00890

work page doi:10.48550/arxiv.2306.00890 2023
[17]

Dermatology and Therapy12(12), 2637– 2651 (Oct 2022)

Liopyris, K., Gregoriou, S., Dias, J., Stratigos, A.J.: Artificial Intelligence in Der- matology: Challenges and Perspectives. Dermatology and Therapy12(12), 2637– 2651 (Oct 2022). https://doi.org/10.1007/s13555-022-00833-8

work page doi:10.1007/s13555-022-00833-8 2022
[18]

A Survey on Hallucination in Large Vision-Language Models

Liu, H., Xue, W., Chen, Y., Chen, D., Zhao, X. et al.: A Sur- vey on Hallucination in Large Vision-Language Models (May 2024). https://doi.org/10.48550/arXiv.2402.00253

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.00253 2024
[19]

et al.: WSI-Agents: A Collabo- rative Multi-Agent System for Multi-Modal Whole Slide Image Analysis

Lyu, X., Liang, Y., Chen, W., Ding, M., Yang, J. et al.: WSI-Agents: A Collabo- rative Multi-Agent System for Multi-Modal Whole Slide Image Analysis

work page
[20]

https://openai.com/index/introducing-gpt-5-2/ (Feb 2026)

OpenAI: Introducing GPT-5.2. https://openai.com/index/introducing-gpt-5-2/ (Feb 2026)

work page 2026
[21]

GPT-4o System Card

OpenAI, Hurst, A., Lerer, A., Goucher, A.P., Perelman, A. et al.: GPT-4o System Card (Oct 2024). https://doi.org/10.48550/arXiv.2410.21276

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2410.21276 2024
[22]

Skin Research and Technology30(7), e13854 (Jul 2024)

Pillai, J., Li, B.: Generative artificial intelligence in dermatology: Recommenda- tions for future studies evaluating the clinical knowledge of models. Skin Research and Technology30(7), e13854 (Jul 2024). https://doi.org/10.1111/srt.13854

work page doi:10.1111/srt.13854 2024
[23]

https://doi.org/10.48550/arXiv.2601.01868

Ru, J., Yan, S., Yin, Y., Zou, Y., Ge, Z.: DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs (Jan 2026). https://doi.org/10.48550/arXiv.2601.01868

work page doi:10.48550/arxiv.2601.01868 2026
[24]

et al.: SkinCaRe: A Multimodal Der- matology Dataset Annotated with Medical Caption and Chain-of-Thought Rea- soning (Nov 2025)

Shen, Y., Sun, L., Xu, Y., Liu, W., Zhang, S. et al.: SkinCaRe: A Multimodal Der- matology Dataset Annotated with Medical Caption and Chain-of-Thought Rea- soning (Nov 2025). https://doi.org/10.48550/arXiv.2405.18004

work page doi:10.48550/arxiv.2405.18004 2025
[25]

Scientific Data5(1), 180161 (Aug 2018)

Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data5(1), 180161 (Aug 2018). https://doi.org/10.1038/sdata.2018.161

work page doi:10.1038/sdata.2018.161 2018
[26]

et al.: MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow (Jul 2025)

Wang, Z., Wu, J., Cai, L., Low, C.H., Yang, X. et al.: MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow (Jul 2025). https://doi.org/10.48550/arXiv.2503.18968

work page doi:10.48550/arxiv.2503.18968 2025
[27]

et al.: Derm1M: A Million-scale Vision- Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology (Apr 2025)

Yan, S., Hu, M., Jiang, Y., Li, X., Fei, H. et al.: Derm1M: A Million-scale Vision- Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology (Apr 2025). https://doi.org/10.48550/arXiv.2503.14911

work page doi:10.48550/arxiv.2503.14911 2025
[28]

et al.: MAKE: Multi-Aspect Knowledge- Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment (May 2025)

Yan, S., Li, X., Hu, M., Jiang, Y., Yu, Z. et al.: MAKE: Multi-Aspect Knowledge- Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment (May 2025). https://doi.org/10.48550/arXiv.2505.09372

work page doi:10.48550/arxiv.2505.09372 2025
[29]

et al.: A multimodal vision foundation model for clinical dermatology

Yan, S., Yu, Z., Primiero, C., Vico-Alonso, C., Wang, Z. et al.: A multimodal vision foundation model for clinical dermatology. Nature Medicine31(8), 2691–2702 (Aug 2025). https://doi.org/10.1038/s41591-025-03747-y

work page doi:10.1038/s41591-025-03747-y 2025
[30]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Zeng, W., Sun, Y., Ma, C., Tan, W., Yan, B.: MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 3769–

work page
[31]

https://doi.org/10.1145/3746027.3755187

MM ’25, Association for Computing Machinery, New York, NY, USA (Oct 2025). https://doi.org/10.1145/3746027.3755187

work page doi:10.1145/3746027.3755187 2025
[32]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H. et al.: Qwen3 embedding: Ad- vancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176 (2025) DermAgent: A Collaborative Agent for Dermatological Image Analysis 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

et al.: An Agentic Sys- tem for Rare Disease Diagnosis with Traceable Reasoning (Aug 2025)

Zhao, W., Wu, C., Fan, Y., Zhang, X., Qiu, P. et al.: An Agentic Sys- tem for Rare Disease Diagnosis with Traceable Reasoning (Aug 2025). https://doi.org/10.48550/arXiv.2506.20430

work page doi:10.48550/arxiv.2506.20430 2025