arxiv: 2602.08280 · v2 · submitted 2026-02-09 · 🧬 q-bio.GN

Recognition: 1 theorem link

· Lean Theorem

ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language-Guided Analysis

Osho Rawal , Rex Lu , Edgar Gonzalez-Kozlova , Sacha Gnjatic , Zeynep H. G\"um\"u\c{s}

Authors on Pith no claims yet

Pith reviewed 2026-05-16 04:03 UTC · model grok-4.3

classification 🧬 q-bio.GN

keywords ClusterChirpomics data explorationnatural language interfaceinteractive visualizationhierarchical clusteringGPU accelerationLLM integrationweb-based platform

0 comments

The pith

ClusterChirp combines GPU rendering, parallel clustering, and an LLM natural language interface to explore large omics data matrices in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ClusterChirp as a web platform built to handle omics data matrices that are too large for conventional visualization tools. It uses graphics processing units for rapid heatmap rendering and multiple processor threads for hierarchical clustering so that users can adjust clusters, sort by different metrics, and search features without first reducing the data size. A large language model interprets spoken commands to carry out these operations and record the steps as reusable workflows. The system also adds in-cluster network views in two or three dimensions plus automatic functional enrichment against biological databases. A sympathetic reader would care because modern sequencing technologies routinely produce matrices that force analysts to discard rows or columns, and the conversational interface aims to remove the need for command-line scripting.

Core claim

ClusterChirp is presented as a web-based platform for real-time exploration of large-scale data matrices in omics research. It combines GPU-accelerated rendering using deck.gl with parallelized hierarchical clustering on multiple CPU cores to enable on-the-fly clustering, multi-metric sorting, feature search, and interactive controls. The platform uniquely incorporates a natural language interface powered by a Large Language Model for performing complex operations and building reproducible workflows through conversational commands. It further supports within-cluster correlation network analysis in 2D or 3D and integrates functional enrichment via biological knowledge bases. The tool is made

What carries the argument

ClusterChirp platform, which merges GPU-accelerated rendering with deck.gl, multi-threaded hierarchical clustering, and an LLM-driven natural language interface to support interactive omics analysis.

If this is right

Full-size matrices can be clustered and visualized on the fly without down-sampling, preserving co-expression patterns that would otherwise be lost.
Conversational commands allow users to build and replay analysis sequences as reproducible workflows inside the same interface.
Within-cluster correlation networks in 2D or 3D become immediately accessible together with functional enrichment results.
A single web interface eliminates the need to switch between separate tools for visualization, clustering, and biological interpretation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the natural language component works reliably, biologists without programming experience could perform advanced exploratory analyses directly.
The same combination of fast rendering and conversational control might apply to other high-dimensional scientific datasets beyond omics.
Automatically recorded conversational workflows could become a new standard for sharing and auditing data-analysis steps in collaborative projects.

Load-bearing premise

The large language model will consistently translate natural language commands into accurate and reproducible analysis steps without errors or the need for extensive user adjustments.

What would settle it

Run a controlled test in which users give the same biological request using varied natural-language phrasing and measure whether the resulting clusters, sorts, and enrichment results match the intended analysis without manual correction.

Figures

Figures reproduced from arXiv: 2602.08280 by Edgar Gonzalez-Kozlova, Osho Rawal, Rex Lu, Sacha Gnjatic, Zeynep H. G\"um\"u\c{s}.

**Figure 2.** Figure 2: ClusterChirp web interface. (A) Homepage displaying an example dataset. The top navigation bar spans tabs for Home, Examples, FAQ (questions and tutorials), and Contact. The left control panel provides options for row and column ordering, search, opacity, value scaling, and filtering. Filters are populated [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 4.** Figure 4: Natural language-guided analysis of treatment response biomarkers in bladder cancer plasma proteomics. Data from the GU16-257 bladder cancer immunotherapy trial (42) comprising 77 plasma proteins measured across 196 samples at four treatment cycles. (A) Hierarchical clustering of the full dataset with cluster selection dialog (Cluster 2, 42 proteins). The command guide popup lists available natural languag… view at source ↗

read the original abstract

High-dimensional omics datasets are routinely visualized as heatmaps, where color intensities reveal co-expression patterns and correlations. However, modern omics technologies increasingly generate matrices so large that existing visual exploration tools require down-sampling or filtering, causing loss of biologically important patterns. Additional barriers arise from tools that require command-line expertise, or fragmented workflows for downstream biological interpretation. We present ClusterChirp, a web-based platform for real-time exploration of large-scale data matrices. The platform combines GPU-accelerated rendering and parallelized hierarchical clustering using multiple CPU cores. Built on deck.gl and multi-threaded clustering algorithms, ClusterChirp supports on-the-fly clustering, multi-metric sorting, feature search and interactive visualization controls within a single interface. Uniquely, a natural language interface powered by a Large Language Model allows users to perform complex operations and build reproducible workflows through conversational commands. ClusterChirp further enables within-cluster correlation network analysis in 2D or 3D, and integrates functional enrichment through biological knowledge bases. Developed with iterative user feedback and adhering to FAIR4S principles, ClusterChirp enables users to extract insights from high-dimensional omics data with unprecedented ease and speed. It is freely available at clusterchirp.mssm.edu without login and is also distributed as a Dockerized application at ghcr.io/gumuslab/clusterchirp.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents ClusterChirp, a web-based platform for real-time exploration of large-scale omics data matrices. It combines GPU-accelerated rendering via deck.gl, parallelized hierarchical clustering on multiple CPU cores, on-the-fly clustering, multi-metric sorting, feature search, interactive visualization controls, within-cluster correlation network analysis in 2D or 3D, functional enrichment via biological knowledge bases, and a natural language interface powered by a Large Language Model that enables complex operations and reproducible workflows through conversational commands. The tool is freely available without login at clusterchirp.mssm.edu and distributed as a Docker image, developed with user feedback and adhering to FAIR4S principles.

Significance. If the performance and reliability claims hold, ClusterChirp would meaningfully advance interactive omics analysis by removing the need for down-sampling, command-line expertise, or fragmented workflows, allowing biologists to perform scalable visualization, clustering, network analysis, and enrichment in a single conversational interface. Explicit strengths include the open web deployment without login, Docker distribution for reproducibility, and integration of GPU rendering with LLM-guided operations. These features address real barriers in high-dimensional data exploration and could serve as a template for future tools.

major comments (1)

[Abstract and LLM interface description] Abstract and natural language interface description: The central claim that the LLM-powered interface enables users to 'perform complex operations and build reproducible workflows through conversational commands' with 'unprecedented ease and speed' is unsupported by evidence. No quantitative evaluation is supplied, such as success rates on benchmark query sets, error rates for domain-specific phrasing (gene-set references, metric names), failure-mode analysis, or reproducibility checks across sessions or model versions. This assumption is load-bearing for the primary novelty and requires empirical validation (e.g., automated test suites or user studies) to substantiate the performance assertions.

minor comments (2)

[Abstract] The abstract references 'FAIR4S principles' without definition or explanation of how they are implemented; adding a short clarification or citation would aid readers.
[Platform architecture description] Specific performance numbers (rendering latency for large matrices, clustering runtime scaling) are absent from the architectural description; including even preliminary benchmarks would strengthen the 'scalable' and 'real-time' claims without altering the tool-focused scope.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of ClusterChirp's overall design and for highlighting the need for stronger empirical support of the LLM interface claims. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The central claim that the LLM-powered interface enables users to 'perform complex operations and build reproducible workflows through conversational commands' with 'unprecedented ease and speed' is unsupported by evidence. No quantitative evaluation is supplied, such as success rates on benchmark query sets, error rates for domain-specific phrasing (gene-set references, metric names), failure-mode analysis, or reproducibility checks across sessions or model versions. This assumption is load-bearing for the primary novelty and requires empirical validation (e.g., automated test suites or user studies) to substantiate the performance assertions.

Authors: We agree that the current manuscript lacks quantitative evaluation of the LLM interface. The claims rest on the system's architecture (prompt engineering for omics-specific operations, session persistence for reproducibility) and on iterative development feedback, but no benchmarked success rates or error analyses are reported. In the revised manuscript we will add a dedicated evaluation subsection that includes: (1) a curated set of 50 domain-specific queries with measured success rates and common failure modes (e.g., ambiguous gene-set references), (2) reproducibility checks across independent sessions using the same model version, and (3) a small user study (n=8 biologists) reporting task-completion time and error rates compared with a command-line baseline. We will also tone down the phrasing 'unprecedented ease and speed' to reflect the new empirical data. revision: yes

Circularity Check

0 steps flagged

No circularity: tool-description paper with no derivations or fitted predictions

full rationale

The manuscript describes the architecture and features of ClusterChirp (GPU rendering, hierarchical clustering, LLM interface) but contains no equations, parameter fits, predictions, or first-principles derivations. The central claims are implementation statements and qualitative assertions about usability; none reduce to the paper's own inputs by construction. Self-citations, if present, are not load-bearing for any quantitative result. This is a standard non-circular tool paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software tool description with no mathematical model, fitted parameters, background axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5568 in / 1047 out tokens · 109441 ms · 2026-05-16T04:03:57.001160+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present ClusterChirp, a web-based platform for real-time exploration of large-scale data matrices that combines GPU-accelerated rendering and parallelized hierarchical clustering with a natural language interface powered by a Large Language Model.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

[1]

cluster genes using Pearson correlation

ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language–Guided Analysis Osho Rawal1, Rex Lu1, Edgar Gonzalez-Kozlova2,3,4, Sacha Gnjatic2,3, Zeynep H. Gümüş1,3,4,* ¹ Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA ² Department of Immunology & Immunotherapy, Icahn School of Me...

work page 2024
[2]

Select samples at C3D1 timepoint

ClusterChirp enables natural language-assisted analysis of bladder cancer treatment response biomarkers. We next analyzed longitudinal plasma proteomics data from a bladder cancer immunotherapy trial (GU16-257; data kindly provided by the study investigators) (42). The dataset includes 77 proteins from the Olink Immuno-Oncology panel (after QC filtering),...

work page 2021
[3]

and Mann,M

Aebersold,R. and Mann,M. (2003) Mass spectrometry-based proteomics. Nature, 422, 198–207. https://doi.org/10.1038/nature01511

work page doi:10.1038/nature01511 2003
[4]

and Patti,G.J

Mahieu,N.G. and Patti,G.J. (2017) Systems-level annotation of a metabolomics data set reduces 25 000 features to fewer than 1000 unique metabolites. Anal. Chem., 89, 10397–10406. https://doi.org/10.1021/acs.analchem.7b02380

work page doi:10.1021/acs.analchem.7b02380 2017
[5]

and Jasbi,P

Mohr,A.E., Ortega-Santos,C.P., Whisner,C.M., Klein-Seetharaman,J. and Jasbi,P. (2024) Navigating challenges and opportunities in multi-omics integration for personalized healthcare. Biomedicines, 12,

work page 2024
[6]

https://doi.org/10.3390/biomedicines12071496

work page doi:10.3390/biomedicines12071496
[7]

and Anamika,K

Subramanian,I., Verma,S., Kumar,S., Jere,A. and Anamika,K. (2020) Multi-omics data integration, interpretation, and its application. Bioinform. Biol. Insights, 14, 1177932219899051. https://doi.org/10.1177/1177932219899051

work page doi:10.1177/1177932219899051 2020
[8]

and Lusis,A

Hasin,Y., Seldin,M. and Lusis,A. (2017) Multi-omics approaches to disease. Genome Biol., 18,

work page 2017
[9]

https://doi.org/10.1186/s13059-017-1215-1

work page doi:10.1186/s13059-017-1215-1
[10]

and Botstein,D

Eisen,M.B., Spellman,P.T., Brown,P.O. and Botstein,D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A., 95, 14863–14868. https://doi.org/10.1073/pnas.95.25.14863

work page doi:10.1073/pnas.95.25.14863 1998
[11]

Modern hierarchical, agglomerative clustering algorithms

Müllner,D. (2011) Modern hierarchical, agglomerative clustering algorithms. arXiv, arXiv:1109.2378

work page internal anchor Pith review Pith/arXiv arXiv 2011
[12]

(2016) Morpheus: versatile matrix visualization and analysis software

Gould,J. (2016) Morpheus: versatile matrix visualization and analysis software. Broad Institute, Cambridge, MA, USA. https://software.broadinstitute.org/morpheus/

work page 2016
[13]

and Broom,B.M

Ryan,M.C., Stucky,M., Wakefield,C., Melott,J.M., Akbani,R., Weinstein,J.N. and Broom,B.M. (2019) Interactive clustered heat map builder: an easy web-based tool for creating sophisticated clustered heat maps. F1000Research, 8,

work page 2019
[14]

https://doi.org/10.12688/f1000research.20590.1

work page doi:10.12688/f1000research.20590.1
[15]

and Wishart,D.S

Babicki,S., Arndt,D., Marcu,A., Liang,Y., Grant,J.R., Maciejewski,A. and Wishart,D.S. (2016) Heatmapper: web-enabled heat mapping for all. Nucleic Acids Res., 44, W147–W153. https://doi.org/10.1093/nar/gkw419

work page doi:10.1093/nar/gkw419 2016
[16]

and Vilo,J

Metsalu,T. and Vilo,J. (2015) ClustVis: a web tool for visualizing clustering of multivariate data using principal component analysis and heatmap. Nucleic Acids Res., 43, W566–W570. https://doi.org/10.1093/nar/gkv468

work page doi:10.1093/nar/gkv468 2015
[17]

(2022) HemI 2.0: an online service for heatmap illustration

Ning,W., Wei,Y., Gao,L., Han,C., Gou,Y., Fu,S., Liu,D., Zhang,C., Huang,X., Wu,S., et al. (2022) HemI 2.0: an online service for heatmap illustration. Nucleic Acids Res., 50, W405–W411. https://doi.org/10.1093/nar/gkac480

work page doi:10.1093/nar/gkac480 2022
[18]

and Ma’ayan,A

Fernandez,N.F., Gundersen,G.W., Rahman,A., Grimes,M.L., Rikova,K., Hornbeck,P. and Ma’ayan,A. (2017) Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci. Data, 4, 170151. https://doi.org/10.1038/sdata.2017.151

work page doi:10.1038/sdata.2017.151 2017
[19]

and Schlesner,M

Gu,Z., Eils,R. and Schlesner,M. (2016) Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics, 32, 2847–2849. https://doi.org/10.1093/bioinformatics/btw313

work page doi:10.1093/bioinformatics/btw313 2016
[20]

(2010) pheatmap: Pretty heatmaps

Kolde,R. (2010) pheatmap: Pretty heatmaps. R package version 1.0.12. https://doi.org/10.32614/CRAN.package.pheatmap

work page doi:10.32614/cran.package.pheatmap 2010
[21]

(2021) seaborn: statistical data visualization

Waskom,M.L. (2021) seaborn: statistical data visualization. J. Open Source Softw., 6,

work page 2021
[22]

https://doi.org/10.21105/joss.03021

work page doi:10.21105/joss.03021
[23]

(2015) Collaborative data science

Plotly Technologies Inc. (2015) Collaborative data science. Plotly Technologies Inc., Montréal, QC, Canada. https://plotly.com/

work page 2015
[24]

and Gümüş,Z.H

Rawal,O., Turhan,B., Peradejordi,I.F., Chandrasekar,S., Kalayci,S., Gnjatic,S., Johnson,J., Bouhaddou,M. and Gümüş,Z.H. (2025) PhosNetVis: a web-based tool for fast kinase-substrate enrichment analysis and interactive 2D/3D network visualizations of phosphoproteomics data. Patterns, 6, 101148. https://doi.org/10.1016/j.patter.2024.101148

work page doi:10.1016/j.patter.2024.101148 2025
[25]

and Gümüş,Z.H

Kalayci,S., Petralia,F., Wang,P. and Gümüş,Z.H. (2020) ProNetView-ccRCC: a web-based portal to interactively explore clear cell renal cell carcinoma proteogenomics networks. Proteomics, 20, e2000043. https://doi.org/10.1002/pmic.202000043

work page doi:10.1002/pmic.202000043 2020
[26]

and Gümüş,Z.H

Liluashvili,V., Kalayci,S., Fluder,E., Wilson,M., Gabow,A. and Gümüş,Z.H. (2017) iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D. Gigascience, 6, 1–13. https://doi.org/10.1093/gigascience/gix054

work page doi:10.1093/gigascience/gix054 2017
[27]

and Gehlenborg,N

Wang,Q., Liu,X., Liang,M.Q., L’Yi,S. and Gehlenborg,N. (2023) Enabling multimodal user interactions for genomics visualization creation. In IEEE Visualization and Visual Analytics (VIS). IEEE, pp. 111–115. https://doi.org/10.1109/VIS54172.2023.00031

work page doi:10.1109/vis54172.2023.00031 2023
[28]

and Gehlenborg,N

Lange,D., Gao,S., Sui,P., Money,A., Misner,P., Zitnik,M. and Gehlenborg,N. (2023) YAC: bridging natural language and interactive visual exploration with generative AI for biomedical data discovery. [Preprint]

work page 2023
[29]

and Wang,J

Shen,L., Shen,E., Luo,Y., Yang,X., Hu,X., Zhang,X., Tai,Z. and Wang,J. (2023) Towards natural language interfaces for data visualization: a survey. IEEE Trans. Vis. Comput. Graph., 29, 3121–3144. https://doi.org/10.1109/TVCG.2022.3148007

work page doi:10.1109/tvcg.2022.3148007 2023
[30]

(2023) LIDA: a tool for automatic generation of grammar-agnostic visualizations and infographics using large language models

Dibia,V. (2023) LIDA: a tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. In Proc. 61st Annu. Meet. Assoc. Comput. Linguist., pp. 113–126. https://doi.org/10.18653/v1/2023.acl-demo.11

work page doi:10.18653/v1/2023.acl-demo.11 2023
[31]

Meta Platforms, Inc

React (2024) A JavaScript library for building user interfaces. Meta Platforms, Inc. https://react.dev/

work page 2024
[32]

Microsoft Corporation

Microsoft Corporation (2024) TypeScript: JavaScript with syntax for types. Microsoft Corporation. https://www.typescriptlang.org/

work page 2024
[33]

(2024) deck.gl: WebGL-powered framework for visual exploratory data analysis

Uber Technologies, Inc. (2024) deck.gl: WebGL-powered framework for visual exploratory data analysis. Uber Technologies, Inc. https://deck.gl/

work page 2024
[34]

and Burger,T

Lazar,C., Gatto,L., Ferro,M., Bruley,C. and Burger,T. (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J. Proteome Res., 15, 1116–1125. https://doi.org/10.1021/acs.jproteome.5b00981

work page doi:10.1021/acs.jproteome.5b00981 2016
[35]

and Huber,W

Bourgon,R., Gentleman,R. and Huber,W. (2010) Independent filtering increases detection power for high-throughput experiments. Proc. Natl. Acad. Sci. U.S.A., 107, 9546–9551. https://doi.org/10.1073/pnas.0914005107

work page doi:10.1073/pnas.0914005107 2010
[36]

(2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research

Mukaka,M.M. (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med. J., 24, 69–71

work page 2012
[37]

and Bastian,M

Jacomy,M., Venturini,T., Heymann,S. and Bastian,M. (2014) ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS One, 9, e98679. https://doi.org/10.1371/journal.pone.0098679

work page doi:10.1371/journal.pone.0098679 2014
[38]

and van Eck,N.J

Traag,V.A., Waltman,L. and van Eck,N.J. (2019) From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep., 9,

work page 2019
[39]

https://doi.org/10.1038/s41598-019-41695-z

work page doi:10.1038/s41598-019-41695-z
[40]

and Ma’ayan,A

Chen,E.Y., Tan,C.M., Kou,Y., Duan,Q., Wang,Z., Meirelles,G.V., Clark,N.R. and Ma’ayan,A. (2013) Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 14,

work page 2013
[41]

https://doi.org/10.1186/1471-2105-14-128

work page doi:10.1186/1471-2105-14-128
[42]

(2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

Kuleshov,M.V., Jones,M.R., Rouillard,A.D., Fernandez,N.F., Duan,Q., Wang,Z., Koplev,S., Jenkins,S.L., Jagodnik,K.M., Lachmann,A., et al. (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res., 44, W90–W97. https://doi.org/10.1093/nar/gkw377

work page doi:10.1093/nar/gkw377 2016
[43]

(2021) Gene set knowledge discovery with Enrichr

Xie,Z., Bailey,A., Kuleshov,M.V., Clarke,D.J.B., Evangelista,J.E., Jenkins,S.L., Lachmann,A., Wojciechowicz,M.L., Kropiwnicki,E., Jagodnik,K.M., et al. (2021) Gene set knowledge discovery with Enrichr. Curr. Protoc., 1, e90. https://doi.org/10.1002/cpz1.90

work page doi:10.1002/cpz1.90 2021
[44]

(2014) Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability

Assarsson,E., Lundberg,M., Holmquist,G., Björkesten,J., Thorsen,S.B., Ekman,D., Eriksson,A., Rennel Dickens,E., Ohlsson,S., Edfeldt,G., et al. (2014) Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One, 9, e95192. https://doi.org/10.1371/journal.pone.0095192

work page doi:10.1371/journal.pone.0095192 2014
[45]

and Jiang,D

Kovatch,P., Gai,L., Cho,H.M., Fluder,E. and Jiang,D. (2020) Optimizing high-performance computing systems for biomedical workloads. IEEE Int. Symp. Parallel Distrib. Process. Workshops PhD Forum, 2020, 183–192. https://doi.org/10.1109/IPDPSW50202.2020.00040

work page doi:10.1109/ipdpsw50202.2020.00040 2020
[46]

and Altman,R.B

Troyanskaya,O., Cantor,M., Sherlock,G., Brown,P., Hastie,T., Tibshirani,R., Botstein,D. and Altman,R.B. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 520–525. https://doi.org/10.1093/bioinformatics/17.6.520

work page doi:10.1093/bioinformatics/17.6.520 2001
[47]

and Groothuis-Oudshoorn,K

van Buuren,S. and Groothuis-Oudshoorn,K. (2011) mice: multivariate imputation by chained equations in R. J. Stat. Softw., 45, 1–67. https://doi.org/10.18637/jss.v045.i03

work page doi:10.18637/jss.v045.i03 2011
[48]

and Bühlmann,P

Stekhoven,D.J. and Bühlmann,P. (2012) MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28, 112–118. https://doi.org/10.1093/bioinformatics/btr597

work page doi:10.1093/bioinformatics/btr597 2012
[49]

(2011) Scikit-learn: machine learning in Python

Pedregosa,F., Varoquaux,G., Gramfort,A., Michel,V., Thirion,B., Grisel,O., Blondel,M., Prettenhofer,P., Weiss,R., Dubourg,V., et al. (2011) Scikit-learn: machine learning in Python. J. Mach. Learn. Res., 12, 2825–2830

work page 2011
[50]

(2023) Gemcitabine and cisplatin plus nivolumab as organ-sparing treatment for muscle-invasive bladder cancer: a phase 2 trial

Galsky,M.D., Daneshmand,S., Izadmehr,S., Gonzalez-Kozlova,E., Chan,K.G., Lewis,S., El Achkar,B., Dorff,T.B., Cetnar,J.P., O’Neil,B., et al. (2023) Gemcitabine and cisplatin plus nivolumab as organ-sparing treatment for muscle-invasive bladder cancer: a phase 2 trial. Nat. Med., 29, 2825–2834. https://doi.org/10.1038/s41591-023-02568-1

work page doi:10.1038/s41591-023-02568-1 2023
[51]

(2025) Multiparametric cellular and spatial organization in cancer tissue lesions with a streamlined pipeline

Buckup,M., Figueiredo,I., Ioannou,G., Ozbey,S., Cabal,R., Tabachnikova,A., Troncoso,L., Le Berichel,J., Zhao,Z., Ward,S.C., et al. (2025) Multiparametric cellular and spatial organization in cancer tissue lesions with a streamlined pipeline. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01475-9. LIST OF FIGURES AND TABLES Figure

work page doi:10.1038/s41551-025-01475-9 2025
[52]

Select males

Natural language-guided analysis of treatment response biomarkers in bladder cancer plasma proteomics. Data from the GU16-257 bladder cancer immunotherapy trial (42) comprising 77 plasma proteins measured across 196 samples at four treatment cycles. (A) Hierarchical clustering of the full dataset with cluster selection dialog (Cluster 2, 42 proteins). The...

work page 2021