CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association

Bernhard Kainz; Kelly Yu; Mengyun Qiao; Paul M. Matthews; Weitong Zhang; Wenjia Bai; Wenlong Zhao; Zuoou Li

arxiv: 2606.28179 · v1 · pith:LDRB2K4Mnew · submitted 2026-06-26 · 💻 cs.LG · cs.AI

CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association

Zuoou Li , Wenlong Zhao , Kelly Yu , Weitong Zhang , Paul M. Matthews , Wenjia Bai , Bernhard Kainz , Mengyun Qiao This is my paper

Pith reviewed 2026-06-29 04:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords composite phenotypescardiac imagingPheWASagentic frameworkphenotype generationdisease associationcardiovascular researchmachine learning

0 comments

The pith

An agentic framework automatically generates composite phenotypes from cardiac imaging features that improve disease discrimination over single-variable baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CPAgents, a system that coordinates three agents to iteratively build and validate composite phenotypes such as polynomials, ratios, and interactions from base cardiac imaging features. Standard PheWAS approaches depend on pre-defined single variables, which cannot capture non-linear effects or cross-feature interactions. The agents handle nomination of transformations, generation of constrained expressions under safety rules, and multi-stage verification to produce interpretable formulas. If the claim holds, this would enable scalable identification of stronger imaging-disease links across population cohorts without relying solely on expert-crafted features.

Core claim

CPAgents coordinates an Analyst that identifies statistical pathologies and nominates candidates, a Proposer that generates medically and statistically motivated expressions, and a Verifier that applies multi-stage criteria to accept phenotypes with evidence trails, yielding composite phenotypes that achieve the top rank in 56 of 72 classifier-disease-metric combinations versus 18 for baselines, with gains across all nine clinical disease categories.

What carries the argument

The three-agent coordination system (Analyst, Proposer, Verifier) that proposes, constrains, and verifies composite phenotype expressions under numerical safety rules.

If this is right

The composite phenotypes achieve top rank in 56 of 72 classifier-disease-metric combinations.
Performance gains appear across all nine clinical disease categories.
The system produces compact, clinically interpretable phenotype formulas.
Transparent evidence trails accompany each accepted phenotype.
The approach enables scalable discovery of phenotype-disease associations beyond expert-driven selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The agentic generation process could be adapted to other imaging modalities or non-cardiac disease domains to test broader applicability.
The interpretable formulas with evidence trails might support clinical review and integration into risk stratification tools.
If the composites capture genuine interactions, they could point to new mechanistic hypotheses for further biological investigation.
Widespread use might reduce dependence on manual feature engineering in large-scale population studies.

Load-bearing premise

The Verifier agent's multi-stage criteria and numerical safety rules are sufficient to filter out spurious or overfit composite phenotypes and retain only those with genuine clinical associations.

What would settle it

Applying the same set of discovered composite phenotypes to an independent cardiac imaging cohort and observing no improvement in discrimination metrics over baselines in the majority of the 72 combinations would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.28179 by Bernhard Kainz, Kelly Yu, Mengyun Qiao, Paul M. Matthews, Weitong Zhang, Wenjia Bai, Wenlong Zhao, Zuoou Li.

**Figure 1.** Figure 1: Overview of CPAgents, an agentic phenotype composition framework. CPAgents iteratively transforms raw cardiovascular phenotypes into highly predictive, hierarchical composite features (f1...fk). An Analyst first profiles feature statistics to guide a Proposer, which synthesizes candidate features using medical, statistical, and exploratory operations. Next, a Verifier filters candidates via sanity, stabi… view at source ↗

**Figure 2.** Figure 2: Disease–phenotype association heatmaps for expert-defined features (left) [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Identifying robust associations between cardiac imaging phenotypes and clinical diseases is fundamental to population-scale cardiovascular research and reliable risk stratification. However, current phenome-wide association studies rely on pre-defined, single-variable phenotypes or expert-crafted features, which limits their ability to capture clinically meaningful non-linear effects and cross-phenotype interactions. To address this, we propose CPAgents, an iterative phenotype-Composition framework for cardiovascular Phenome-wide association study (PheWAS) that automatically constructs and validates interpretable composite phenotypes (e.g., polynomial, ratio, and interaction forms) from base imaging features. Specifically, our system coordinates three agents: (i) an Analyst that identifies statistical pathologies and nominates candidate transformations; (ii) a Proposer that generates constrained, medically and statistically motivated expressions under numerical safety rules; and (iii) a Verifier that evaluates candidates using multi-stage criteria and produces transparent evidence trails for accepted phenotypes. Evaluated on a population-scale cardiac imaging cohort, the discovered composite phenotypes markedly improve disease discrimination: across 72 classifier-disease-metric combinations, our variants achieve the top rank in 56 cases versus 18 for baselines, with gains observed across all nine clinical disease categories. Our framework yields compact, clinically interpretable phenotype formulas with transparent evidence trails, enabling scalable discovery of stronger phenotype-disease associations beyond expert-driven feature selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CPAgents introduces a three-agent loop for building composite phenotypes from cardiac imaging, but the 56/72 top-rank claim lacks the controls needed to rule out selection artifacts.

read the letter

The punchline is that this paper puts together an Analyst-Proposer-Verifier agent system to generate polynomial, ratio, and interaction phenotypes from base imaging features and then tests them for disease associations. The architecture is new in this specific constrained medical-imaging setting, and the outputs are compact formulas with evidence trails, which is useful for interpretability.

It does a reasonable job showing gains across nine disease categories on a population cohort. The idea of having the Proposer respect numerical safety rules and medical motivation before the Verifier scores them is a practical constraint that earlier symbolic or regression approaches sometimes skipped.

The soft spot is the evaluation. The headline result—top rank in 56 of 72 classifier-disease-metric combinations—depends on the Verifier actually filtering out spurious composites. The abstract mentions multi-stage criteria but supplies no held-out evaluation, permutation baseline, or multiple-testing correction for the many candidates the loop can produce. If those steps are missing or weak, the ranking advantage could come from repeated testing on the same data rather than genuine signal. The stress-test note flags exactly this, and the provided text does not contradict it.

This is for groups already working on automated phenotype engineering or agentic pipelines in medical imaging. A methods reader might pick up the agent coordination pattern, but anyone planning to reuse the phenotypes would need the full methods and supplementary results to check reproducibility.

I would send it for peer review. The core setup is worth a proper look even if the current performance numbers need tighter validation.

Referee Report

2 major / 1 minor

Summary. The paper proposes CPAgents, an agentic framework with Analyst, Proposer, and Verifier agents that iteratively generates and validates composite phenotypes (polynomial, ratio, interaction forms) from cardiac imaging features for PheWAS. It claims these phenotypes improve disease discrimination, achieving top rank in 56 of 72 classifier-disease-metric combinations versus 18 for baselines, across all nine clinical disease categories, while producing compact interpretable formulas with evidence trails.

Significance. If the performance gains hold after proper controls for multiple testing and overfitting, the work would advance scalable, automated discovery of non-linear phenotype-disease associations in cardiovascular imaging beyond expert-defined single features, with potential for broader PheWAS applications.

major comments (2)

[Abstract] Abstract: the top-rank claim (56/72 cases) provides no information on baseline definitions, statistical tests, cross-validation procedures, or multiple-comparisons correction, preventing assessment of whether the reported gains reflect genuine associations rather than selection artifacts from the iterative loop.
[Abstract] The Verifier's multi-stage criteria and numerical safety rules are described only at a high level with no mention of held-out evaluation sets, permutation baselines, or explicit controls for the number of candidate expressions tested; this directly bears on whether the 56 superior rankings are robust or spurious.

minor comments (1)

[Abstract] The abstract refers to 'transparent evidence trails' without indicating how these are presented or archived for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that greater specificity on evaluation procedures would strengthen the summary and will revise the abstract accordingly. Point-by-point responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract] Abstract: the top-rank claim (56/72 cases) provides no information on baseline definitions, statistical tests, cross-validation procedures, or multiple-comparisons correction, preventing assessment of whether the reported gains reflect genuine associations rather than selection artifacts from the iterative loop.

Authors: We acknowledge the abstract's brevity limits detail on these elements. The full manuscript (Methods) defines baselines as the original cardiac imaging features plus expert single-variable phenotypes, employs stratified k-fold cross-validation for all classifiers, and applies FDR correction across the 72 combinations. We will revise the abstract to include a concise clause summarizing the evaluation protocol and statistical controls to allow immediate assessment of the ranking results. revision: yes
Referee: [Abstract] The Verifier's multi-stage criteria and numerical safety rules are described only at a high level with no mention of held-out evaluation sets, permutation baselines, or explicit controls for the number of candidate expressions tested; this directly bears on whether the 56 superior rankings are robust or spurious.

Authors: The abstract condenses the Verifier description; the manuscript (Section 3.2) specifies the multi-stage criteria, numerical safety rules, and constraints on expression complexity. The current evaluation uses internal cohort validation rather than separate held-out sets or permutation baselines for the agent loop. We will update the abstract to reference these controls and the bounded search space. Additional external validation experiments are outside the current scope but could be noted as future work if required. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical agent framework with no derivations or self-referential fits

full rationale

The manuscript describes an iterative Analyst-Proposer-Verifier agent system for constructing composite phenotypes from imaging features and evaluates them empirically on a cardiac cohort. No equations, parameter fits, uniqueness theorems, or derivation chains appear in the provided text. Performance rankings (56/72 top ranks) are presented as direct empirical outcomes rather than predictions derived from fitted inputs. No self-citations are invoked as load-bearing premises, and the Verifier criteria are described as external multi-stage rules rather than self-defining the acceptance metric. The central claim therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations or methods sections, so free parameters, axioms, and invented entities cannot be identified.

pith-pipeline@v0.9.1-grok · 5792 in / 1033 out tokens · 43066 ms · 2026-06-29T04:39:00.199245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 2 linked inside Pith

[1]

Assessing bias: the importance of considering confounding

Skelly AC, Dettori JR, Brodt ED. Assessing bias: the importance of considering confounding. Evidence-based spine-care journal. 2012;3(01):9-12

2012
[2]

Fully automated, quality-controlled cardiac analysis from CMR: validation and large-scale application to characterize cardiac function

Ruijsink B, Puyol-Antón E, Oksuz I, Sinclair M, Bai W, Schnabel JA, et al. Fully automated, quality-controlled cardiac analysis from CMR: validation and large-scale application to characterize cardiac function. Cardiovascular Imaging. 2020;13(3):684-95

2020
[3]

Explainable artificial intelligence in radiolog- ical cardiovascular imaging—A systematic review

Haupt M, Maurer MH, Thomas RP. Explainable artificial intelligence in radiolog- ical cardiovascular imaging—A systematic review. Diagnostics. 2025;15(11):1399

2025
[4]

From compressed-sensing to artifi- cial intelligence-based cardiac MRI reconstruction

Bustin A, Fuin N, Botnar RM, Prieto C. From compressed-sensing to artifi- cial intelligence-based cardiac MRI reconstruction. Frontiers in cardiovascular medicine. 2020;7:17

2020
[5]

Cardiac imaging in coronary artery disease: differing modalities

Schuijf JD, Shaw LJ, Wijns W, Lamb HJ, Poldermans D, de Roos A, et al. Cardiac imaging in coronary artery disease: differing modalities. Heart. 2005;91(8):1110-7

2005
[6]

Artificial intel- ligence in cardiovascular medicine: clinical applications

Lüscher TF, Wenzl FA, D’Ascenzo F, Friedman PA, Antoniades C. Artificial intel- ligence in cardiovascular medicine: clinical applications. European heart journal. 2024;45(40):4291-304. 10 Z. Li et al

2024
[7]

PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations

Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics. 2010;26(9):1205-10

2010
[8]

A population- based phenome-wide association study of cardiac and aortic structure and function

Bai W, Suzuki H, Huang J, Francis C, Wang S, Tarroni G, et al. A population- based phenome-wide association study of cardiac and aortic structure and function. Nature Medicine. 2020;26(10):1654-62

2020
[9]

Prospective study design and data analysis in UK Biobank

Allen NE, Lacey B, Lawlor DA, Pell JP, Gallacher J, Smeeth L, et al. Prospective study design and data analysis in UK Biobank. Science translational medicine. 2024;16(729):eadf4428

2024
[10]

A machine learning model for identifying patients at risk for wild-type transthyretin amyloid cardiomyopathy

Huda A, Castaño A, Niyogi A, Schumacher J, Stewart M, Bruno M, et al. A machine learning model for identifying patients at risk for wild-type transthyretin amyloid cardiomyopathy. Nature communications. 2021;12(1):2725

2021
[11]

Confounding factors need to be accounted for in assessing bias by machine learning algorithms

Mukherjee P, Shen TC, Liu J, Mathai T, Shafaat O, Summers RM. Confounding factors need to be accounted for in assessing bias by machine learning algorithms. Nature Medicine. 2022;28(6):1159-60

2022
[12]

Multi- agent reasoning for cardiovascular imaging phenotype analysis

Zhang W, Qiao M, Zang C, Niederer S, Matthews PM, Bai W, et al. Multi- agent reasoning for cardiovascular imaging phenotype analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2025. p. 429-39

2025
[13]

AutoQual: An LLM Agent for Automated DiscoveryofInterpretableFeaturesforReviewQualityAssessment

Lan X, Feng J, Liu Y, Li Y, et al. AutoQual: An LLM Agent for Automated DiscoveryofInterpretableFeaturesforReviewQualityAssessment. In:Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track; 2025. p. 1250-64

2025
[14]

Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers

Abhyankar N, Shojaee P, Reddy CK. Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers. 2025. arXiv preprint arXiv:2503.14434

Pith/arXiv arXiv 2025
[15]

Medagent-pro: Towards evidence-based multi-modal medical diagnosis via reasoning agentic workflow

Wang Z, Wu J, Cai L, Low CH, Yang X, Li Q, et al. Medagent-pro: Towards evidence-based multi-modal medical diagnosis via reasoning agentic workflow
[16]

arXiv preprint arXiv:2503.18968

arXiv
[17]

Medrax: Medical reasoning agent for chest x-ray

Fallahpour A, Ma J, Munim A, Lyu H, Wang B. Medrax: Medical reasoning agent for chest x-ray. 2025. arXiv preprint arXiv:2502.02673

arXiv 2025
[18]

Regularization and variable selection via the elastic net

Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301- 20

2005
[19]

A unified approach to interpreting model predictions

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Ad- vances in neural information processing systems. 2017;30:4765-74

2017
[20]

Deepseek-v3.2: Pushing the frontier of open large language models

Liu A, Mei A, Lin B, Xue B, Wang B, Xu B, et al. Deepseek-v3.2: Pushing the frontier of open large language models. 2025. arXiv preprint arXiv:2512.02556

Pith/arXiv arXiv 2025

[1] [1]

Assessing bias: the importance of considering confounding

Skelly AC, Dettori JR, Brodt ED. Assessing bias: the importance of considering confounding. Evidence-based spine-care journal. 2012;3(01):9-12

2012

[2] [2]

Fully automated, quality-controlled cardiac analysis from CMR: validation and large-scale application to characterize cardiac function

Ruijsink B, Puyol-Antón E, Oksuz I, Sinclair M, Bai W, Schnabel JA, et al. Fully automated, quality-controlled cardiac analysis from CMR: validation and large-scale application to characterize cardiac function. Cardiovascular Imaging. 2020;13(3):684-95

2020

[3] [3]

Explainable artificial intelligence in radiolog- ical cardiovascular imaging—A systematic review

Haupt M, Maurer MH, Thomas RP. Explainable artificial intelligence in radiolog- ical cardiovascular imaging—A systematic review. Diagnostics. 2025;15(11):1399

2025

[4] [4]

From compressed-sensing to artifi- cial intelligence-based cardiac MRI reconstruction

Bustin A, Fuin N, Botnar RM, Prieto C. From compressed-sensing to artifi- cial intelligence-based cardiac MRI reconstruction. Frontiers in cardiovascular medicine. 2020;7:17

2020

[5] [5]

Cardiac imaging in coronary artery disease: differing modalities

Schuijf JD, Shaw LJ, Wijns W, Lamb HJ, Poldermans D, de Roos A, et al. Cardiac imaging in coronary artery disease: differing modalities. Heart. 2005;91(8):1110-7

2005

[6] [6]

Artificial intel- ligence in cardiovascular medicine: clinical applications

Lüscher TF, Wenzl FA, D’Ascenzo F, Friedman PA, Antoniades C. Artificial intel- ligence in cardiovascular medicine: clinical applications. European heart journal. 2024;45(40):4291-304. 10 Z. Li et al

2024

[7] [7]

PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations

Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics. 2010;26(9):1205-10

2010

[8] [8]

A population- based phenome-wide association study of cardiac and aortic structure and function

Bai W, Suzuki H, Huang J, Francis C, Wang S, Tarroni G, et al. A population- based phenome-wide association study of cardiac and aortic structure and function. Nature Medicine. 2020;26(10):1654-62

2020

[9] [9]

Prospective study design and data analysis in UK Biobank

Allen NE, Lacey B, Lawlor DA, Pell JP, Gallacher J, Smeeth L, et al. Prospective study design and data analysis in UK Biobank. Science translational medicine. 2024;16(729):eadf4428

2024

[10] [10]

A machine learning model for identifying patients at risk for wild-type transthyretin amyloid cardiomyopathy

Huda A, Castaño A, Niyogi A, Schumacher J, Stewart M, Bruno M, et al. A machine learning model for identifying patients at risk for wild-type transthyretin amyloid cardiomyopathy. Nature communications. 2021;12(1):2725

2021

[11] [11]

Confounding factors need to be accounted for in assessing bias by machine learning algorithms

Mukherjee P, Shen TC, Liu J, Mathai T, Shafaat O, Summers RM. Confounding factors need to be accounted for in assessing bias by machine learning algorithms. Nature Medicine. 2022;28(6):1159-60

2022

[12] [12]

Multi- agent reasoning for cardiovascular imaging phenotype analysis

Zhang W, Qiao M, Zang C, Niederer S, Matthews PM, Bai W, et al. Multi- agent reasoning for cardiovascular imaging phenotype analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2025. p. 429-39

2025

[13] [13]

AutoQual: An LLM Agent for Automated DiscoveryofInterpretableFeaturesforReviewQualityAssessment

Lan X, Feng J, Liu Y, Li Y, et al. AutoQual: An LLM Agent for Automated DiscoveryofInterpretableFeaturesforReviewQualityAssessment. In:Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track; 2025. p. 1250-64

2025

[14] [14]

Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers

Abhyankar N, Shojaee P, Reddy CK. Llm-fe: Automated feature engineering for tabular data with llms as evolutionary optimizers. 2025. arXiv preprint arXiv:2503.14434

Pith/arXiv arXiv 2025

[15] [15]

Medagent-pro: Towards evidence-based multi-modal medical diagnosis via reasoning agentic workflow

Wang Z, Wu J, Cai L, Low CH, Yang X, Li Q, et al. Medagent-pro: Towards evidence-based multi-modal medical diagnosis via reasoning agentic workflow

[16] [16]

arXiv preprint arXiv:2503.18968

arXiv

[17] [17]

Medrax: Medical reasoning agent for chest x-ray

Fallahpour A, Ma J, Munim A, Lyu H, Wang B. Medrax: Medical reasoning agent for chest x-ray. 2025. arXiv preprint arXiv:2502.02673

arXiv 2025

[18] [18]

Regularization and variable selection via the elastic net

Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301- 20

2005

[19] [19]

A unified approach to interpreting model predictions

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Ad- vances in neural information processing systems. 2017;30:4765-74

2017

[20] [20]

Deepseek-v3.2: Pushing the frontier of open large language models

Liu A, Mei A, Lin B, Xue B, Wang B, Xu B, et al. Deepseek-v3.2: Pushing the frontier of open large language models. 2025. arXiv preprint arXiv:2512.02556

Pith/arXiv arXiv 2025