CodeCytos: AI-assisted spatial molecular imaging analysis via code-augmented agent action space

Anh-Vu Nguyen; Hien V. Nguyen; Hong Zhao; Hung Q. Vo; Huy Q. Vo; Jianting Sheng; Son T. Ly; Stephen T. C. Wong; Zhihao Wan

arxiv: 2606.00472 · v1 · pith:MIJUY66Tnew · submitted 2026-05-30 · 💻 cs.CV · cs.AI· cs.HC· cs.LG

CodeCytos: AI-assisted spatial molecular imaging analysis via code-augmented agent action space

Hung Q. Vo , Huy Q. Vo , Son T. Ly , Zhihao Wan , Anh-Vu Nguyen , Hong Zhao , Jianting Sheng , Stephen T. C. Wong

show 1 more author

Hien V. Nguyen

This is my paper

Pith reviewed 2026-06-28 19:09 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.HCcs.LG

keywords spatial molecular imagingcode-augmented agentscustom feature explorationtissue image analysisAI-assisted analysisbiomarker discoveryLLM coding

0 comments

The pith

CodeCytos lets AI agents write and run custom code to explore spatial features in molecular tissue images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes CodeCytos as a framework where an AI agent uses code generation to interact directly with spatial molecular imaging data instead of relying on fixed software tools. This setup allows users to request custom spatial cellular analyses through simple questions without needing pre-built features or detailed instructions. Evaluations across four tissue datasets show the agent outperforms baselines, with further gains from adding a few random coding examples from outside the domain. If correct, this would let bioscientists automate tailored explorations that current tools cannot handle efficiently.

Core claim

CodeCytos is a coding-based reasoning agent framework that enables dynamic, programmable interaction with spatial molecular imaging data to improve automation and customization of cellular analysis. It supports exploration of custom spatial cellular features by letting the agent generate and execute code in response to minimal user prompts, and it demonstrates outperformance over baseline approaches on expert-curated datasets from frontal cortex, non-small-cell lung cancer, pancreas, and tonsil tissues when using LLM backbones with coding capabilities.

What carries the argument

The code-augmented agent action space, which lets the agent generate executable code to query and compute custom spatial features on the imaging data rather than selecting from a fixed menu of operations.

If this is right

Bioscientists can request custom spatial analyses using only simple natural-language questions without task-specific instructions.
Performance on custom feature tasks improves substantially when the agent receives a small number of domain-agnostic coding examples.
The same framework can be applied across different tissue types without retraining or expert-crafted demonstrations for each new study.
Custom biomarker exploration becomes more scalable because the agent adapts to new questions by writing fresh code rather than depending on pre-implemented functions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same code-generation approach could be tested on other imaging modalities such as multiplexed immunofluorescence or spatial transcriptomics data.
Integration with interactive environments might allow iterative refinement where a user corrects an initial code output and the agent continues from there.
If the generated code can be automatically verified against ground-truth statistics on public datasets, the framework could support higher-stakes biomarker pipelines.

Load-bearing premise

Large language models can reliably produce correct and useful analysis code for spatial cellular tasks from only minimal prompts and a few examples drawn from unrelated domains.

What would settle it

Run the agent on a held-out spatial analysis task with a minimal prompt, then compare the accuracy and correctness of its generated code output against expert-written reference code for the same task.

Figures

Figures reproduced from arXiv: 2606.00472 by Anh-Vu Nguyen, Hien V. Nguyen, Hong Zhao, Hung Q. Vo, Huy Q. Vo, Jianting Sheng, Son T. Ly, Stephen T. C. Wong, Zhihao Wan.

**Figure 1.** Figure 1: CodeCytos Agent Usage Workflow and Benchmarking Performance. a, Proposed workflow for bioscientists using CodeCytos: Bioscientists first prepare tissue samples and upload the digitized data to the pipeline. Next, they select appropriate cell-segmentation and cell-classification tools, consulting AI scientists as needed. CodeCytos then configures and applies the chosen segmentation and classification method… view at source ↗

**Figure 2.** Figure 2: Architecture Diagram of Our Proposed CodeCytos Agent. a, CodeCytos agent diagram, given the requested spatial feature by bioscientists, the agent carries out a multi-turn of thinking/reasoning, text/code generation, environment observation steps. This is based on the idea of ReAct agent13, which iteratively thinks, acts, and observes. Moreover, CodeCytos action space is extended with the ability of writing… view at source ↗

**Figure 3.** Figure 3: AUP@k by category on the Frontal_Cortex dataset. We compare two settings: (1) a tool-augmented LLM and (2) the CodeAct-based CodeCytos agent. The first three x-axis groups correspond to tool-augmented LLM variants: (1.a) without CoT, (1.b) zero-shot CoT, and (1.c) few-shot CoT. The last three x-axis groups correspond to CodeCytos variants: (2.a) CodeCytos, (2.b) CodeCytos with 1-shot demonstrations, and (2… view at source ↗

**Figure 4.** Figure 4: AUP@k by category on the NSCLC dataset. We compare two settings: (1) a tool-augmented LLM and (2) the CodeAct-based CodeCytos agent. The first three x-axis groups correspond to tool-augmented LLM variants: (1.a) without CoT, (1.b) zero-shot CoT, and (1.c) few-shot CoT. The last three x-axis groups correspond to CodeCytos variants: (2.a) CodeCytos, (2.b) CodeCytos with 1-shot demonstrations, and (2.c) CodeC… view at source ↗

**Figure 5.** Figure 5: AUP@k by category on the Pancreas dataset. We compare two settings: (1) a tool-augmented LLM and (2) the CodeAct-based CodeCytos agent. The first three x-axis groups correspond to tool-augmented LLM variants: (1.a) without CoT, (1.b) zero-shot CoT, and (1.c) few-shot CoT. The last three x-axis groups correspond to CodeCytos variants: (2.a) CodeCytos, (2.b) CodeCytos with 1-shot demonstrations, and (2.c) Co… view at source ↗

**Figure 6.** Figure 6: AUP@k by category on the Tonsil dataset. We compare two settings: (1) a tool-augmented LLM and (2) the CodeAct-based CodeCytos agent. The first three x-axis groups correspond to tool-augmented LLM variants: (1.a) without CoT, (1.b) zero-shot CoT, and (1.c) few-shot CoT. The last three x-axis groups correspond to CodeCytos variants: (2.a) CodeCytos, (2.b) CodeCytos with 1-shot demonstrations, and (2.c) Code… view at source ↗

**Figure 7.** Figure 7: Performance heatmaps across four datasets. Heatmaps compare multiple LLM backbones across feature categories and datasets, revealing that Nearest Neighbor Distances (NND) consistently achieve the highest performance and appear least challenging, with neighborhood-related features typically following. Models cluster into two groups—Kimi-Linear-Instruct vs. coding-optimized backbones—with Devstral-2-123B ach… view at source ↗

**Figure 8.** Figure 8: Pass@k curves for CodeCytos across four tissue-type datasets with different LLM backbones. The choice of LLM backbone leads to distinct pass@k trajectories across datasets. Performance generally begins to plateau around k ≈ 10; we therefore report representative values at k ∈ {1,5,20}. Overall, GLM-4.5-Air and Devstral-2-123B achieve the strongest results. While increasing the number of attempts raises pas… view at source ↗

**Figure 9.** Figure 9: Effect of domain-agnostic few-shot coding–reasoning demonstrations on M3ToolEval. We compare CODEACT (zero-shot) with CODEACT-FS (ours; CODEACT augmented with domain-agnostic few-shot demonstrations) against JSON and Text action modes on the original M3ToolEval benchmark from CodeAct8 (82 human-curated, multi-turn, multi-tool tasks). a, Per-model success rate (%, higher is better) and average number of dia… view at source ↗

**Figure 10.** Figure 10: Example CodeCytos agent run on an NSCLC tissue field of view (FOV) answering a bioscientist’s query: “What is the variance-to-mean ratio (VMR) of T-cell counts in 30 µm square bins?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. Notably, in step 2 the agent recognizes that the physical scale is not provided and … view at source ↗

**Figure 11.** Figure 11: Example CodeCytos agent run on an NSCLC tissue field of view (FOV) answering a bioscientist’s query: “What is the mean nearest-neighbor distance from epithelial cells to the closest stromal cell within the FOV?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. CodeCytos leverages cdist, a highly efficient function … view at source ↗

**Figure 12.** Figure 12: Example CodeCytos agent run on an Frontal Cortex tissue field of view (FOV) answering a bioscientist’s query: “What is the average number of edges per node in the Delaunay triangulation among astrocytes?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. 21/32 [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: Example CodeCytos agent run on an Frontal Cortex tissue field of view (FOV) answering a bioscientist’s query: “In the minimum spanning tree (MST) over astrocytes only, what is the mean edge length?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. CodeCytos leverages networkx to build a graph and compute the minimu… view at source ↗

**Figure 14.** Figure 14: Example CodeCytos agent run on an Pancreas tissue field of view (FOV) answering a bioscientist’s query: “What is the median Hausdorff distance from epithelial polygons to the nearest PSC polygon (µm)?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. CodeCytos interprets “PSC” as pancreatic stellate cells and obser… view at source ↗

**Figure 15.** Figure 15: Example CodeCytos agent run on an Pancreas tissue field of view (FOV) answering a bioscientist’s query: “What is the average degree of macrophage nodes in a Delaunay triangulation over all cell centroids?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. Methods Dataset Collection and Expert Curation For cellular t… view at source ↗

**Figure 16.** Figure 16: Example CodeCytos agent run on an Tonsil tissue field of view (FOV) answering a bioscientist’s query: “What fraction of epithelial cells are on the FOV boundary (i.e., their Voronoi cell intersects the boundary)?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. 25/32 [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗

**Figure 17.** Figure 17: Example CodeCytos agent run on an Tonsil tissue field of view (FOV) answering a bioscientist’s query: “What is the Clark–Evans R for lymphocytes (observed nearest-neighbor distance vs. CSR expectation) within the FOV?” The bioscientist provides exactly the question shown in the figure, with no additional task specifications or spatial-analysis instructions. CodeCytos recognizes domain-specific terms such … view at source ↗

read the original abstract

Conventional tissue image analysis software provides foundational capabilities for cellular analysis, including segmentation, basic morphological feature extraction, and spatial organization analysis. However, these tools often require manual intervention and are not well integrated with code-driven automation, limiting efficiency and scalability for complex spatial tissue studies. In addition, they offer limited flexibility for custom analyses, as they typically support only a fixed set of pre-implemented spatial cellular features. To address these limitations, we propose CodeCytos, a coding-based reasoning agent framework that enables dynamic, programmable interaction with spatial molecular imaging data to improve automation and customization. CodeCytos is designed to streamline the exploration of custom spatial cellular features and adapt to diverse research needs. We demonstrate its utility through case studies on four expert-curated datasets from distinct tissue types: frontal cortex, non-small-cell lung cancer, pancreas, and tonsil. We evaluate CodeCytos under a realistic minimal prompt setting, where bioscientists pose simple questions without task-specific instructions or contextual information about spatial cellular analysis, and benchmark multiple LLM backbones with strong coding capabilities. We further show that incorporating tailored, domain-agnostic few-shot in-context coding-reasoning examples (randomly sampled demonstrations outside the spatial analysis domain) can substantially improve performance without requiring costly, expert-crafted in-domain demonstrations. Overall, CodeCytos outperforms baseline approaches, highlighting the potential of code-action agents to assist with custom feature exploration in spatial molecular imaging and to accelerate biomarker discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CodeCytos shows a code-augmented LLM agent for flexible custom analysis in spatial molecular imaging, tested with minimal prompts and out-of-domain examples on four tissue datasets.

read the letter

The main takeaway is that CodeCytos turns spatial molecular imaging analysis into an agent that generates and runs code for custom features, rather than sticking to fixed tool outputs. It evaluates this under minimal prompts where users just ask simple questions, and shows that adding a handful of domain-agnostic few-shot coding examples improves results without needing expert in-domain demos.

What stands out as new is the specific combination: a code-action space tailored to this domain, run on four expert-curated datasets from frontal cortex, non-small-cell lung cancer, pancreas, and tonsil, with the minimal-prompt regime and the out-of-domain few-shot trick. Existing tools are limited to pre-set features and manual steps, so the agent approach directly targets that rigidity and automation gap.

The paper handles the case-study framing reasonably, keeping claims tied to the datasets rather than broad assertions. The idea of using generic examples to bootstrap performance is practical and avoids expensive domain-specific data collection.

The soft spot is the evidence. The abstract states outperformance over baselines but supplies no metrics, no error bars, no baseline details, and no statistical tests. That leaves the reliability of the LLM-generated code as an open question, exactly as the weakest assumption flags. Without those numbers it is hard to judge whether the gains are consistent or task-dependent.

This is for computational biologists and imaging researchers who already work with spatial data and want more programmable exploration. A reader focused on agent systems for scientific workflows could pick up the evaluation setup and the few-shot angle.

It deserves peer review. The technique is distinct enough in the subfield and the evaluation choices are thoughtful, so a referee can check the full methods, code, and any quantitative results that are missing from the abstract.

Referee Report

0 major / 3 minor

Summary. The paper introduces CodeCytos, a coding-based reasoning agent framework that augments LLMs with a code-generation action space to enable dynamic, programmable analysis of spatial molecular imaging data. It targets limitations in conventional tools by supporting custom spatial cellular feature exploration without fixed pre-implemented feature sets. The approach is evaluated via case studies on four expert-curated datasets (frontal cortex, non-small-cell lung cancer, pancreas, tonsil) under a minimal-prompt setting using domain-agnostic few-shot in-context examples, with the central claim that CodeCytos outperforms baseline approaches and can accelerate biomarker discovery.

Significance. If the reported outperformance is substantiated with quantitative results, the framework offers a practical route to greater flexibility and automation in spatial tissue analysis, reducing reliance on manual intervention and enabling researchers to define custom analyses via natural language prompts. The use of domain-agnostic few-shot examples without task-specific or in-domain expert demonstrations is a pragmatic strength that could lower barriers for bioscientists.

minor comments (3)

[Abstract] Abstract: The claim that CodeCytos 'outperforms baseline approaches' is stated without naming the baselines, providing any performance metrics, or describing the evaluation protocol; the results section should explicitly define these to allow readers to assess the comparison.
[Abstract] Abstract: The four datasets are described only as 'expert-curated' from distinct tissue types; including accession numbers, imaging modalities, or cell-type annotations would improve reproducibility and context for the case studies.
[Abstract] Abstract: The phrase 'domain-agnostic few-shot in-context coding-reasoning examples (randomly sampled demonstrations outside the spatial analysis domain)' is introduced without an example or citation to the prompting strategy; a brief illustration or reference would clarify the method.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of CodeCytos, the recognition of its practical strengths in using domain-agnostic few-shot examples, and the recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical demonstration of an applied agent framework evaluated via case studies on four datasets, with performance reported against baselines under a minimal-prompt setting. No derivations, equations, fitted parameters, or self-citations appear in the provided text that reduce any claimed result to a quantity defined by the authors' own prior choices or inputs. The central claims rest on observed outperformance in the described experiments rather than any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLMs can act as reliable code-writing agents for spatial analysis tasks; the framework itself is the primary invented entity with no independent evidence supplied beyond the reported case studies.

axioms (1)

domain assumption Large language models with strong coding capabilities can generate accurate code for custom spatial cellular feature extraction when given minimal prompts and out-of-domain few-shot examples.
This assumption underpins the agent's performance in the described minimal-prompt setting and the reported improvement from few-shot examples.

invented entities (1)

CodeCytos no independent evidence
purpose: A coding-based reasoning agent framework enabling dynamic, programmable interaction with spatial molecular imaging data.
Newly introduced system whose utility is demonstrated through case studies.

pith-pipeline@v0.9.1-grok · 5837 in / 1389 out tokens · 38076 ms · 2026-06-28T19:09:47.588083+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 12 canonical work pages · 8 internal anchors

[1]

Tumour host-location (thl) lab: Halo

Cancer Research UK Oxford Centre. Tumour host-location (thl) lab: Halo. https://www.cancer.ox.ac.uk/support/THL/ HALO (n.d.). Accessed 2026-01-25. 3.Visiopharm. Visiopharm. https://visiopharm.com/ (n.d.). Accessed 2026-01-25

2026
[2]

Chiu, C.-L., Clack, N.et al.Napari: a python multi-dimensional image viewer platform for the research community. Microsc. Microanal.28, 1576–1577 (2022)

2022
[3]

T., Hiner, M

Schindelin, J., Rueden, C. T., Hiner, M. C. & Eliceiri, K. W. The imagej ecosystem: An open platform for biomedical image analysis.Mol. reproduction development82, 518–529 (2015). 6.Palla, G.et al.Squidpy: a scalable framework for spatial omics analysis.Nat. methods19, 171–178 (2022)

2015
[4]

E.et al.Cellprofiler: image analysis software for identifying and quantifying cell phenotypes.Genome biology7, R100 (2006)

Carpenter, A. E.et al.Cellprofiler: image analysis software for identifying and quantifying cell phenotypes.Genome biology7, R100 (2006)

2006
[5]

InForty-first International Conference on Machine Learning(2024)

Wang, X.et al.Executable code actions elicit better llm agents. InForty-first International Conference on Machine Learning(2024). 9.Zhou, J.et al.An ai agent for fully automated multi-omic analyses.Adv. Sci.11, 2407094 (2024). 10.Wang, H.et al.Spatialagent: An autonomous ai agent for spatial biology.bioRxiv2025–04 (2025)

2024
[6]

& W ANG, B

Fallahpour, A., Ma, J., Munim, A., Lyu, H. & W ANG, B. Medrax: Medical reasoning agent for chest x-ray. InForty-second International Conference on Machine Learning. 30/32
[7]

In28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025), 680–690 (Springer, 2026)

Lyu, X.et al.Wsi-agents: A collaborative multi-agent system for multi-modal whole slide image analysis. In28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025), 680–690 (Springer, 2026)

2025
[8]

InThe eleventh international conference on learning representations(2022)

Yao, S.et al.React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations(2022)

2022
[9]

InProceedings of the 2018 conference on empirical methods in natural language processing, 2369–2380 (2018)

Yang, Z.et al.Hotpotqa: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 conference on empirical methods in natural language processing, 2369–2380 (2018)

2018
[10]

FEVER: a large-scale dataset for Fact Extraction and VERification

Thorne, J., Vlachos, A., Christodoulopoulos, C. & Mittal, A. Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Shridhar, M.et al.Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[12]

Liang, J.et al.Code as policies: Language model programs for embodied control.arXiv preprint arXiv:2209.07753 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning

NVIDIA. Nemo gym: An open source library for scaling reinforcement learning environments for llm. https://github.com/ NVIDIA-NeMo/Gym (2025). GitHub repository. 19.Cheng, M.et al.Agent-r1: Training powerful llm agents with end-to-end reinforcement learning (2025). 2511.14460

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

https://pretty-radio-b75.notion

Luo, M.et al.Deepswe: Training a state-of-the-art coding agent from scratch by scaling rl. https://pretty-radio-b75.notion. site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art-Coding-Agent-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33 (2025). Notion Blog

2025
[15]

neural information processing systems36, 53728–53741 (2023)

Rafailov, R.et al.Direct preference optimization: Your language model is secretly a reward model.Adv. neural information processing systems36, 53728–53741 (2023)

2023
[16]

Guo, D.et al.Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

What learning algorithm is in-context learning? Investigations with linear models

Akyürek, E., Schuurmans, D., Andreas, J., Ma, T. & Zhou, D. What learning algorithm is in-context learning? investigations with linear models.arXiv preprint arXiv:2211.15661(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

InInternational Conference on Machine Learning, 35151–35174 (PMLR, 2023)

V on Oswald, J.et al.Transformers learn in-context by gradient descent. InInternational Conference on Machine Learning, 35151–35174 (PMLR, 2023)

2023
[19]

Bertsch, A.et al.In-context learning with long-context models: An in-depth exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 12119–12149 (2025). 26.Agarwal, R.et al.Many-shot in-context learning.Adv. Neural Inf. Pro...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Copet, J.et al.Cwm: An open-weights llm for research on code generation with world models.arXiv preprint arXiv:2510.02387(2025)

work page arXiv 2025
[21]

33.Team, K.et al.Kimi linear: An expressive, efficient attention architecture.arXiv preprint arXiv:2510.26692(2025)

Rastogi, A.et al.Devstral: Fine-tuning language models for coding agent applications.arXiv preprint arXiv:2509.25193 (2025). 33.Team, K.et al.Kimi linear: An expressive, efficient attention architecture.arXiv preprint arXiv:2510.26692(2025)

work page arXiv 2025
[22]

Zeng, A.et al.Glm-4.5: Agentic, reasoning, and coding (arc) foundation models.arXiv preprint arXiv:2508.06471(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

neural information processing systems35, 24824–24837 (2022)

Wei, J.et al.Chain-of-thought prompting elicits reasoning in large language models.Adv. neural information processing systems35, 24824–24837 (2022)

2022
[24]

J.et al.Nova: An agentic framework for automated histopathology analysis and discovery.arXiv preprint arXiv:2511.11324(2025)

Vaidya, A. J.et al.Nova: An agentic framework for automated histopathology analysis and discovery.arXiv preprint arXiv:2511.11324(2025)

work page arXiv 2025
[25]

& Pachitariu, M

Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation.Nat. methods18, 100–106 (2021). 31/32 38.Pachitariu, M. & Stringer, C. Cellpose 2.0: how to train your own model.Nat. methods19, 1634–1641 (2022)

2021
[26]

& Stringer, C

Pachitariu, M., Rariden, M. & Stringer, C. Cellpose-sam: superhuman generalization for cellular segmentation.bioRxiv 2025–04 (2025)

2025
[27]

41.Stevens, M.et al.Stardist image segmentation improves circulating tumor cell detection.Cancers14, 2916 (2022)

Goldsborough, T.et al.Instanseg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentation.arXiv preprint arXiv:2408.15954(2024). 41.Stevens, M.et al.Stardist image segmentation improves circulating tumor cell detection.Cancers14, 2916 (2022). 42.Archit, A.et al.Segment anything for microscopy.Nat....

work page arXiv 2024
[28]

A.et al.Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments

Van Valen, D. A.et al.Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments. PLoS computational biology12, e1005177 (2016)

2016
[29]

& Nadeem, S

Ghahremani, P., Marino, J., Dodds, R. & Nadeem, S. Deepliif: An online platform for quantification of clinical pathology slides. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21399–21405 (2022)

2022
[30]

Image Analysis94, 103143 (2024)

Hörst, F.et al.Cellvit: Vision transformers for precise cell segmentation and classification.Med. Image Analysis94, 103143 (2024). 46.Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. biotechnology33, 495–502 (2015)

2024
[31]

Dries, R.et al.Giotto: a toolbox for integrative analysis and visualization of spatial expression data.Genome biology22, 78 (2021)

2021
[32]

methods18, 1352–1362 (2021)

Biancalani, T.et al.Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram.Nat. methods18, 1352–1362 (2021)

2021
[33]

biotechnology40, 661–671 (2022)

Kleshchevnikov, V .et al.Cell2location maps fine-grained cell types in spatial transcriptomics.Nat. biotechnology40, 661–671 (2022). 50.Kwon, W.et al.Efficient memory management for large language model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles(2023)

2022
[34]

neural information processing systems37, 62557–62583 (2024)

Zheng, L.et al.Sglang: Efficient execution of structured language model programs.Adv. neural information processing systems37, 62557–62583 (2024). 32/32

2024

[1] [1]

Tumour host-location (thl) lab: Halo

Cancer Research UK Oxford Centre. Tumour host-location (thl) lab: Halo. https://www.cancer.ox.ac.uk/support/THL/ HALO (n.d.). Accessed 2026-01-25. 3.Visiopharm. Visiopharm. https://visiopharm.com/ (n.d.). Accessed 2026-01-25

2026

[2] [2]

Chiu, C.-L., Clack, N.et al.Napari: a python multi-dimensional image viewer platform for the research community. Microsc. Microanal.28, 1576–1577 (2022)

2022

[3] [3]

T., Hiner, M

Schindelin, J., Rueden, C. T., Hiner, M. C. & Eliceiri, K. W. The imagej ecosystem: An open platform for biomedical image analysis.Mol. reproduction development82, 518–529 (2015). 6.Palla, G.et al.Squidpy: a scalable framework for spatial omics analysis.Nat. methods19, 171–178 (2022)

2015

[4] [4]

E.et al.Cellprofiler: image analysis software for identifying and quantifying cell phenotypes.Genome biology7, R100 (2006)

Carpenter, A. E.et al.Cellprofiler: image analysis software for identifying and quantifying cell phenotypes.Genome biology7, R100 (2006)

2006

[5] [5]

InForty-first International Conference on Machine Learning(2024)

Wang, X.et al.Executable code actions elicit better llm agents. InForty-first International Conference on Machine Learning(2024). 9.Zhou, J.et al.An ai agent for fully automated multi-omic analyses.Adv. Sci.11, 2407094 (2024). 10.Wang, H.et al.Spatialagent: An autonomous ai agent for spatial biology.bioRxiv2025–04 (2025)

2024

[6] [6]

& W ANG, B

Fallahpour, A., Ma, J., Munim, A., Lyu, H. & W ANG, B. Medrax: Medical reasoning agent for chest x-ray. InForty-second International Conference on Machine Learning. 30/32

[7] [7]

In28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025), 680–690 (Springer, 2026)

Lyu, X.et al.Wsi-agents: A collaborative multi-agent system for multi-modal whole slide image analysis. In28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025), 680–690 (Springer, 2026)

2025

[8] [8]

InThe eleventh international conference on learning representations(2022)

Yao, S.et al.React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations(2022)

2022

[9] [9]

InProceedings of the 2018 conference on empirical methods in natural language processing, 2369–2380 (2018)

Yang, Z.et al.Hotpotqa: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 conference on empirical methods in natural language processing, 2369–2380 (2018)

2018

[10] [10]

FEVER: a large-scale dataset for Fact Extraction and VERification

Thorne, J., Vlachos, A., Christodoulopoulos, C. & Mittal, A. Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Shridhar, M.et al.Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[12] [12]

Liang, J.et al.Code as policies: Language model programs for embodied control.arXiv preprint arXiv:2209.07753 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[13] [13]

Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning

NVIDIA. Nemo gym: An open source library for scaling reinforcement learning environments for llm. https://github.com/ NVIDIA-NeMo/Gym (2025). GitHub repository. 19.Cheng, M.et al.Agent-r1: Training powerful llm agents with end-to-end reinforcement learning (2025). 2511.14460

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

https://pretty-radio-b75.notion

Luo, M.et al.Deepswe: Training a state-of-the-art coding agent from scratch by scaling rl. https://pretty-radio-b75.notion. site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art-Coding-Agent-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33 (2025). Notion Blog

2025

[15] [15]

neural information processing systems36, 53728–53741 (2023)

Rafailov, R.et al.Direct preference optimization: Your language model is secretly a reward model.Adv. neural information processing systems36, 53728–53741 (2023)

2023

[16] [16]

Guo, D.et al.Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

What learning algorithm is in-context learning? Investigations with linear models

Akyürek, E., Schuurmans, D., Andreas, J., Ma, T. & Zhou, D. What learning algorithm is in-context learning? investigations with linear models.arXiv preprint arXiv:2211.15661(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[18] [18]

InInternational Conference on Machine Learning, 35151–35174 (PMLR, 2023)

V on Oswald, J.et al.Transformers learn in-context by gradient descent. InInternational Conference on Machine Learning, 35151–35174 (PMLR, 2023)

2023

[19] [19]

Bertsch, A.et al.In-context learning with long-context models: An in-depth exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 12119–12149 (2025). 26.Agarwal, R.et al.Many-shot in-context learning.Adv. Neural Inf. Pro...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Copet, J.et al.Cwm: An open-weights llm for research on code generation with world models.arXiv preprint arXiv:2510.02387(2025)

work page arXiv 2025

[21] [21]

33.Team, K.et al.Kimi linear: An expressive, efficient attention architecture.arXiv preprint arXiv:2510.26692(2025)

Rastogi, A.et al.Devstral: Fine-tuning language models for coding agent applications.arXiv preprint arXiv:2509.25193 (2025). 33.Team, K.et al.Kimi linear: An expressive, efficient attention architecture.arXiv preprint arXiv:2510.26692(2025)

work page arXiv 2025

[22] [22]

Zeng, A.et al.Glm-4.5: Agentic, reasoning, and coding (arc) foundation models.arXiv preprint arXiv:2508.06471(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

neural information processing systems35, 24824–24837 (2022)

Wei, J.et al.Chain-of-thought prompting elicits reasoning in large language models.Adv. neural information processing systems35, 24824–24837 (2022)

2022

[24] [24]

J.et al.Nova: An agentic framework for automated histopathology analysis and discovery.arXiv preprint arXiv:2511.11324(2025)

Vaidya, A. J.et al.Nova: An agentic framework for automated histopathology analysis and discovery.arXiv preprint arXiv:2511.11324(2025)

work page arXiv 2025

[25] [25]

& Pachitariu, M

Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation.Nat. methods18, 100–106 (2021). 31/32 38.Pachitariu, M. & Stringer, C. Cellpose 2.0: how to train your own model.Nat. methods19, 1634–1641 (2022)

2021

[26] [26]

& Stringer, C

Pachitariu, M., Rariden, M. & Stringer, C. Cellpose-sam: superhuman generalization for cellular segmentation.bioRxiv 2025–04 (2025)

2025

[27] [27]

41.Stevens, M.et al.Stardist image segmentation improves circulating tumor cell detection.Cancers14, 2916 (2022)

Goldsborough, T.et al.Instanseg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentation.arXiv preprint arXiv:2408.15954(2024). 41.Stevens, M.et al.Stardist image segmentation improves circulating tumor cell detection.Cancers14, 2916 (2022). 42.Archit, A.et al.Segment anything for microscopy.Nat....

work page arXiv 2024

[28] [28]

A.et al.Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments

Van Valen, D. A.et al.Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments. PLoS computational biology12, e1005177 (2016)

2016

[29] [29]

& Nadeem, S

Ghahremani, P., Marino, J., Dodds, R. & Nadeem, S. Deepliif: An online platform for quantification of clinical pathology slides. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21399–21405 (2022)

2022

[30] [30]

Image Analysis94, 103143 (2024)

Hörst, F.et al.Cellvit: Vision transformers for precise cell segmentation and classification.Med. Image Analysis94, 103143 (2024). 46.Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. biotechnology33, 495–502 (2015)

2024

[31] [31]

Dries, R.et al.Giotto: a toolbox for integrative analysis and visualization of spatial expression data.Genome biology22, 78 (2021)

2021

[32] [32]

methods18, 1352–1362 (2021)

Biancalani, T.et al.Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram.Nat. methods18, 1352–1362 (2021)

2021

[33] [33]

biotechnology40, 661–671 (2022)

Kleshchevnikov, V .et al.Cell2location maps fine-grained cell types in spatial transcriptomics.Nat. biotechnology40, 661–671 (2022). 50.Kwon, W.et al.Efficient memory management for large language model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles(2023)

2022

[34] [34]

neural information processing systems37, 62557–62583 (2024)

Zheng, L.et al.Sglang: Efficient execution of structured language model programs.Adv. neural information processing systems37, 62557–62583 (2024). 32/32

2024