EchoRisk: A Multicentre Echocardiography Dataset and Benchmark for Cardio-Oncology
Pith reviewed 2026-07-02 13:56 UTC · model grok-4.3
The pith
EchoRisk supplies the first multicentre echocardiography dataset with explicit cardiotoxicity labels for breast cancer patients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EchoRisk is the first curated, multicentre, longitudinal echocardiography dataset with explicit cardiotoxicity labels from the CARDIOCARE study, released to support three tasks where video models achieve strong performance on ejection-fraction estimation and dysfunction classification but early prediction from a single pre-therapy video remains a significant open problem.
What carries the argument
The EchoRisk dataset of 2159 videos from 422 patients across five sites, together with the three clinically defined tasks and their evaluation protocols.
If this is right
- Models can now be trained and compared on a standardised multicentre set for ejection-fraction estimation from echocardiography videos.
- Longitudinal sequences enable classification of left-ventricular dysfunction with the supplied baseline performance as reference.
- Early cardiotoxicity prediction from baseline scans alone is established as an unsolved problem requiring new methods.
- Public code and data release supports direct comparison of future task-specific architectures in cardio-oncology.
Where Pith is reading between the lines
- Wider adoption of the benchmark could accelerate development of tools that flag cardiotoxicity risk before therapy begins, reducing unplanned treatment interruptions.
- Validation studies comparing the supplied labels against independent expert panels would be needed to confirm cross-site consistency.
- Adding data from non-European populations would test whether performance generalises beyond the current cohort.
Load-bearing premise
The cardiotoxicity labels assigned at the five sites are clinically accurate and applied consistently.
What would settle it
Independent clinical re-review of the labels showing substantial disagreement with the provided annotations, or a different baseline architecture achieving markedly higher accuracy on the early-prediction task.
Figures
read the original abstract
Therapy-induced cardiotoxicity is the leading non-oncological cause of treatment interruption in breast cancer patients, yet early, automated risk stratification from routine cardiac imaging remains an unsolved problem. We present EchoRisk, the first curated, multicentre, longitudinal echocardiography dataset with explicit cardiotoxicity labels, released as the primary technical reference for the EchoRisk-MICCAI 2026 challenge. The dataset comprises 422 patients enrolled in the EU-funded CARDIOCARE prospective study across five European sites, yielding 2,159 echocardiography videos across 1,123 clinical exams acquired at up to five longitudinal timepoints, alongside a dedicated cohort of 280 patients with baseline imaging for early cardiotoxicity prediction. Three clinically grounded tasks are defined: automated estimation of left ventricular ejection fraction from cine video (Task 1), classification of LV dysfunction from longitudinal imaging (Task 2), and early prediction of therapy-induced cardiotoxicity from pre-therapy baseline echocardiography alone (Task 3). For each task we specify the evaluation protocol, primary and secondary metrics, and ranking procedure. We establish baseline performance using an R(2+1)D video backbone with LSTM aggregation trained from Kinetics-400 pretrained weights, demonstrating strong discriminative performance for cardiac functional assessment and LV dysfunction classification, while early cardiotoxicity prediction from a single pre-therapy video remains a significant open problem for the community. The dataset, evaluation code, and baseline implementations are publicly available to serve as a benchmark for further collaboration, comparison, and the creation of task-specific architectures in cardio-oncology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EchoRisk, a multicentre longitudinal echocardiography dataset from the CARDIOCARE study (422 patients, 2159 videos from 1123 exams across five European sites, plus a 280-patient baseline cohort). It defines three tasks for the EchoRisk-MICCAI 2026 challenge—Task 1: automated LVEF estimation from cine video; Task 2: classification of LV dysfunction from longitudinal imaging; Task 3: early prediction of therapy-induced cardiotoxicity from pre-therapy baseline video alone—along with evaluation protocols, metrics, and ranking procedures. Baselines using an R(2+1)D video backbone with LSTM aggregation (Kinetics-400 pretrained) are reported, showing strong performance on functional assessment and dysfunction classification but highlighting early prediction as an open problem. The dataset, code, and baselines are released publicly.
Significance. If the cardiotoxicity labels prove clinically accurate and consistent, this would be the first publicly available multicentre echo dataset with explicit cardiotoxicity annotations for breast cancer patients, directly supporting development of automated early-risk tools in cardio-oncology where such data have been scarce. The release as a challenge benchmark with defined tasks, metrics, and reproducible baselines strengthens its utility for community progress on an unsolved clinical problem.
major comments (1)
- [Abstract] Abstract: The central claim that EchoRisk supplies 'explicit cardiotoxicity labels' and 'clinically grounded' tasks is load-bearing for the benchmark's validity (especially Tasks 2 and 3), yet no definition of cardiotoxicity (e.g., specific LVEF drop threshold, timing relative to therapy, or composite clinical events), no adjudication process, no exclusion criteria, and no inter-site agreement statistics are supplied. Without these, reported baseline discriminative performance and the conclusion that early prediction remains open cannot be verified or reproduced.
minor comments (1)
- [Abstract] The abstract states patient numbers and video counts but does not specify the exact primary/secondary metrics or ranking procedure for each task; these should be enumerated explicitly even if detailed later.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for explicit details on cardiotoxicity label definitions to support the benchmark's validity. We address the major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that EchoRisk supplies 'explicit cardiotoxicity labels' and 'clinically grounded' tasks is load-bearing for the benchmark's validity (especially Tasks 2 and 3), yet no definition of cardiotoxicity (e.g., specific LVEF drop threshold, timing relative to therapy, or composite clinical events), no adjudication process, no exclusion criteria, and no inter-site agreement statistics are supplied. Without these, reported baseline discriminative performance and the conclusion that early prediction remains open cannot be verified or reproduced.
Authors: We agree this information is essential for reproducibility and clinical grounding of Tasks 2 and 3. The labels derive from the CARDIOCARE study protocol, but the current manuscript does not detail the exact criteria (e.g., LVEF thresholds, timing, or composite events), adjudication, exclusions, or inter-rater/site agreement. In the revision we will add a dedicated subsection in Methods describing: the precise cardiotoxicity definition used (including any LVEF drop thresholds and timing relative to therapy), the adjudication process, exclusion criteria applied, and any available inter-site agreement statistics. If certain statistics were not collected in the original study we will explicitly note this as a limitation. These additions will allow readers to verify the baselines and the claim that early prediction (Task 3) remains challenging. revision: yes
Circularity Check
No circularity: dataset release and benchmark with no derivations or self-referential claims
full rationale
The paper is a data release and benchmark definition for the EchoRisk-MICCAI 2026 challenge. It describes the CARDIOCARE-derived dataset, defines three tasks (LVEF estimation, LV dysfunction classification, early cardiotoxicity prediction), specifies evaluation protocols, and reports baseline results from a standard R(2+1)D+LSTM model pretrained on Kinetics-400. No equations, fitted parameters presented as predictions, self-citations used for uniqueness theorems, ansatzes, or renamings of known results appear in the provided text. The central claims rest on external clinical study data and standard ML baselines rather than any internal derivation chain that reduces to its own inputs. This is the expected non-finding for a benchmark paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Circula- tion131(22), 1981–1988 (2015)
Cardinale, D., Colombo, A., Bacchiani, G., Tedeschi, I., Meroni, C.A., Veglia, F., Civelli, M., Lamantia, G., Colombo, N., Curigliano, G., et al.: Early detection of anthracycline cardiotoxicity and improvement with heart failure therapy. Circula- tion131(22), 1981–1988 (2015)
1981
-
[2]
CARDIOCARE Consortium: An interdisciplinary approach for the management of the elderly multimorbid patient with breast cancer therapy induced cardiac toxicity.https://cordis.europa.eu/project/id/945175(2021), eU Horizon 2020, Grant Agreement No. 945175
2021
-
[3]
npj Digital Medicine3(1), 10 (2020).https://doi.org/10.1038/s41746-019-0216-8
Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J.H., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Deep learning interpretation of echocardiograms. npj Digital Medicine3(1), 10 (2020).https://doi.org/10.1038/s41746-019-0216-8
-
[4]
European Heart Journal-Cardiovascular Imaging 16(3), 233–271 (2015)
Lang, R.M., Badano, L.P., Mor-Avi, V., Afilalo, J., Armstrong, A., Ernande, L., Flachskampf, F.A., Foster, E., Goldstein, S.A., Kuznetsova, T., et al.: Recom- mendations for cardiac chamber quantification by echocardiography in adults: an update from the american society of echocardiography and the european associ- ation of cardiovascular imaging. Europea...
2015
-
[5]
Leclerc, S., Smistad, E., Pedrosa, J., Østvik, A., Cervenansky, F., Espinosa, F., Espeland, T., Berg, E.A.R., Jodoin, P.M., Grenier, T., Lartizien, C., D’hooge, J., Lovstakken, L., Bernard, O.: Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Transactions on Medical Imaging 38(9), 2198–2210 (2019).https://doi.o...
-
[6]
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection.In:ProceedingsoftheIEEEinternationalconferenceoncomputervision. pp. 2980–2988 (2017)
2017
-
[7]
European Journal of Heart Failure22(11), 1945–1960 (2020).https: //doi.org/10.1002/ejhf.1920
Lyon, A.R., Dent, S., Stanway, S., Earl, H., Brezden-Masley, C., Cohen-Solal, A., Tocchetti, C.G., Moslehi, J.J., Melloni, C., Herrmann, J., et al.: Baseline car- diovascular risk assessment in cancer patients scheduled to receive cardiotoxic cancer therapies: A position statement and new risk assessment tools from the Cardio-Oncology study group of the H...
-
[8]
European Heart Journal-Cardiovascular Imaging23(10), e333–e465 (2022)
Lyon, A.R., Lopez-Fernandez, T., Couch, L.S., Asteggiano, R., Aznar, M.C., Bergler-Klein, J., Boriani, G., Cardinale, D., Cordoba, R., Cosyns, B., et al.: 2022 esc guidelines on cardio-oncology developed in collaboration with the european hematology association (eha), the european society for therapeutic radiology and oncology (estro) and the internationa...
2022
-
[9]
European Heart Journal- Cardiovascular Imaging26(Supplement_1), jeae333–028 (2025)
Manikis, G., Kalliatakis, G., Marias, K., Bouratzis, V., Lakkas, L., Naka, A., Karanasiou, G., Tsekoura, D., Kampouroglou, E., Keramida, K., et al.: Asso- ciation of echocardiographic radiomics-based features with cardiotoxicity effect in breast cancer patients from the cardiocare project. European Heart Journal- Cardiovascular Imaging26(Supplement_1), je...
2025
-
[10]
In: Proceedings of the AAAI conference on artificial intelligence
Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabili- ties using bayesian binning. In: Proceedings of the AAAI conference on artificial intelligence. vol. 29 (2015)
2015
-
[11]
Nature580(7802), 252–256 (2020)
Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C.P., Heiden- reich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., et al.: Video-based ai for beat-to-beat assessment of cardiac function. Nature580(7802), 252–256 (2020)
2020
-
[12]
Steyerberg, E.W., et al.: Clinical prediction models, vol. 201. Springer (2019)
2019
-
[13]
Journal of the American College of Cardiology61(1), 77–84 (2013)
Thavendiranathan, P., Grant, A.D., Negishi, T., Plana, J.C., Popović, Z.B., Mar- wick, T.H.: Reproducibility of echocardiographic techniques for sequential assess- mentofleftventricularejectionfractionandvolumes:applicationtopatientsunder- going cancer chemotherapy. Journal of the American College of Cardiology61(1), 77–84 (2013)
2013
-
[14]
In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 6450–6459 (2018)
2018
-
[15]
European Heart Journal37(36), 2768–2801 (2016).https://doi.org/10.1093/eurheartj/ehw211
Zamorano, J.L., Lancellotti, P., Rodriguez Muñoz, D., Aboyans, V., Asteggiano, R., Galderisi, M., Habib, G., Lenihan, D.J., Lip, G.Y.H., Lyon, A.R., Lopez Fer- nandez, T., Mohty, D., Piepoli, M.F., Tamargo, J., Torbicki, A., Suter, T.M., ESC Scientific Document Group: 2016 ESC position paper on cancer treatments and cardiovascular toxicity developed under...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.