pith. machine review for the scientific record. sign in

arxiv: 2603.21743 · v3 · submitted 2026-03-23 · 💻 cs.LG · q-bio.QM

Recognition: unknown

CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

Dongxia Wu, Elaine Sui, Emily B. Fox, Emma Lundberg, Serena Yeung-Levy, Shiye Su, Yuhui Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:01 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords virtual cellsreinforcement learninggenerative modelsbiological constraintscell imagingdrug discoverymachine learning for biology
0
0 comments X

The pith

Reinforcement learning post-training with biological rewards improves virtual cell generators to respect physical and biological rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that generative models for virtual cells often produce images that look realistic but violate basic biological and physical laws. To fix this, it applies reinforcement learning after the initial training of the CellFlux model, using seven reward functions that score outputs on biological function, structural validity, and morphological correctness. This results in CellFluxRL, which outperforms the base model across these measures and benefits from additional test-time scaling. If true, this means virtual cell simulations can move from pretty pictures to useful tools for testing drugs and understanding cellular behavior in a controlled digital environment.

Core claim

By post-training the state-of-the-art CellFlux model with reinforcement learning guided by seven biologically meaningful reward functions spanning biological function, structural validity, and morphological correctness, CellFluxRL achieves consistent improvements over CellFlux and gains further boosts from test-time scaling, shifting virtual cell modeling from visually realistic to biologically meaningful generations.

What carries the argument

Reinforcement learning optimization using evaluators as reward functions that enforce physical and biological constraints on generated cell images.

If this is right

  • Virtual cell models can now be optimized for real-world biological plausibility rather than just visual appeal.
  • Test-time scaling provides an additional way to enhance performance without further training.
  • The framework can be extended to other generative models in biology to add constraint enforcement.
  • Drug discovery pipelines gain more reliable in silico testing capabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the rewards accurately reflect biology, this method could be combined with other simulation techniques for hybrid models.
  • Similar reinforcement learning constraints might apply to generative models in other scientific domains like protein structures.
  • Future work could explore learning the rewards themselves from data instead of hand-designing them.

Load-bearing premise

The designed reward functions accurately represent true biological and physical constraints without introducing biases or loopholes that the model can exploit.

What would settle it

A direct comparison experiment showing that CellFluxRL-generated cells do not better match real cellular behaviors or responses in laboratory assays compared to those from the original CellFlux model.

Figures

Figures reproduced from arXiv: 2603.21743 by Dongxia Wu, Elaine Sui, Emily B. Fox, Emma Lundberg, Serena Yeung-Levy, Shiye Su, Yuhui Zhang.

Figure 1
Figure 1. Figure 1: Failure of cell generation. Despite its success, we observe that these image-based virtual cell models can produce im￾ages that look realistic yet are biologically implausible. For instance, using the state-of-the￾art image-based virtual cell model, CellFlux [47], we observe anomalies such as the cell nucleus being generated outside of the cytoplasm ( [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Motivation. Current generative models for simulating cellular perturbations can fail to produce physically plausible cell images. For example, nuclei may appear outside the cell membrane. We design a suite of biologically meaningful verifiers in three roles: (1) as evaluators to assess the biological correctness of generated images, (2) as reward signals to improve generation via reinforcement learning, an… view at source ↗
Figure 3
Figure 3. Figure 3: CellFluxRL algorithm. RL post-training seeks to increase the likelihood of high-reward samples and de￾crease the likelihood of low-reward samples. Therefore, the core training loop of CellFluxRL consists of interleaved phases of sampling and training. (a) Sampling: we generate multiple rollouts from a fixed control image and pertur￾bation condition, scoring each with the reward models. (b) Training: becaus… view at source ↗
Figure 4
Figure 4. Figure 4: The baselines generate images that fail to reflect the expected biological response to each perturbation. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparisons. CellFluxRL generates more biologically-grounded images, better capturing drug￾induced morphological changes. In these examples, Etoposide-induced cell rounding, Demecolcine-driven micro￾tubule destabilization, and AZ138-associated cell shrinkage are all more faithfully reproduced, and cell density more closely matches the ground truth for Cisplatin. Test-time scaling (+TTS) further… view at source ↗
Figure 5
Figure 5. Figure 5: Test-time scaling by best-of-N further improves generation quality. The sample achieving the highest overall (combined) reward is selected from N rollouts, and each individual reward is plotted. RL (orange) consistently exhibits better scaling than the base model (blue) across all rewards. reward, the same weighted combination of individual rewards used during RL post-training, then report the individual r… view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity analysis on KL weight β. Each subplot shows reward sensitivity to β after RL post-training [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MoA reward failure cases. The pretrained CellFlux baseline (left) generates images that do not match the expected morphological profile for the given perturbation, as measured by the MoA reward. CellFluxRL + TTS (right) corrects these failures by explicitly optimizing for MoA consistency during RL post-training. Ground-truth target images for the same perturbation conditions are shown in [PITH_FULL_IMAGE:… view at source ↗
Figure 8
Figure 8. Figure 8: Roundness reward failure cases. The pretrained CellFlux baseline (left) produces nuclei with irregular, implausible shapes that deviate from the MoA-conditioned ground-truth distribution. CellFluxRL + TTS (right) gen￾erates nuclei with roundness statistics consistent with real cells under the same perturbation condition. Ground-truth target images for the same perturbation conditions are shown in [PITH_FU… view at source ↗
read the original abstract

Building virtual cells with generative models to simulate cellular behavior in silico is emerging as a promising paradigm for accelerating drug discovery. However, prior image-based generative approaches can produce implausible cell images that violate basic physical and biological constraints. To address this, we propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond "visually realistic" generations towards "biologically meaningful" ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CellFluxRL, a post-training reinforcement learning approach applied to the CellFlux generative model for virtual cells. It introduces seven reward functions spanning biological function, structural validity, and morphological correctness to enforce physical and biological constraints during optimization, reporting consistent improvements over the base CellFlux model across all rewards along with further gains from test-time scaling.

Significance. If the reward functions prove to be faithful, ungameable proxies for real cellular biology, the work could meaningfully advance virtual cell modeling for drug discovery by shifting from purely visual generation to constraint-enforcing simulation. The multi-category reward design and test-time scaling results are positive elements that could influence future RL-augmented generative pipelines in biology.

major comments (2)
  1. [Abstract] Abstract: The central claim that CellFluxRL produces 'biologically meaningful' outputs (rather than merely visually realistic ones) rests entirely on improvements measured against the same seven reward functions used as the RL training objective. Because gains on these exact signals are guaranteed by construction, the reported results supply no independent evidence that the optimized cells satisfy constraints outside the reward definitions.
  2. [Evaluation] Evaluation section: No held-out biological assays, expert biologist ratings, gene-expression correlations, or live-cell dynamics comparisons are described to test whether high reward scores correspond to genuine biophysical validity. Without such external validation, it remains unclear whether the RL stage reduces reward hacking or simply amplifies the base model's ability to exploit the provided evaluators.
minor comments (2)
  1. [Introduction] The abstract and introduction would benefit from a brief explicit statement of the base CellFlux architecture and training data to allow readers to assess how the RL stage interacts with the original generative prior.
  2. [Methods] Notation for the seven individual reward functions should be defined consistently (e.g., R_bio, R_struct, etc.) when first introduced so that later quantitative comparisons can be traced to specific components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive comments on our work. We provide point-by-point responses to the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that CellFluxRL produces 'biologically meaningful' outputs (rather than merely visually realistic ones) rests entirely on improvements measured against the same seven reward functions used as the RL training objective. Because gains on these exact signals are guaranteed by construction, the reported results supply no independent evidence that the optimized cells satisfy constraints outside the reward definitions.

    Authors: We agree that the primary quantitative improvements are reported on the reward functions used during RL training. However, these reward functions are not arbitrary; they are explicitly designed based on established biological principles, physical constraints, and morphological criteria drawn from the literature on cell biology. The fact that the base CellFlux model can be further optimized via RL to achieve higher scores on these independent evaluators demonstrates the framework's ability to enforce constraints beyond what the generative model alone achieves. We do not present this as direct experimental validation but as evidence that RL can bridge the gap from visual realism to constraint satisfaction. To address potential concerns about reward hacking, we note that the diverse set of seven rewards across categories makes simultaneous exploitation more challenging. revision: no

  2. Referee: [Evaluation] Evaluation section: No held-out biological assays, expert biologist ratings, gene-expression correlations, or live-cell dynamics comparisons are described to test whether high reward scores correspond to genuine biophysical validity. Without such external validation, it remains unclear whether the RL stage reduces reward hacking or simply amplifies the base model's ability to exploit the provided evaluators.

    Authors: We acknowledge the value of external validation such as biologist ratings or live-cell experiments. Our current study is focused on developing and evaluating the RL post-training methodology within a computational framework, using the reward functions as proxies for biological constraints. Implementing wet-lab validations or recruiting expert raters would require significant additional resources and collaborations beyond the scope of this manuscript. We believe the consistent gains across multiple reward categories and the test-time scaling results provide initial support for the approach. We can add a discussion section on the limitations and the need for future experimental validation. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces seven reward functions as external, hand-designed biological evaluators and applies RL post-training to a base CellFlux model. The reported improvement on those rewards is a direct, expected outcome of successful optimization rather than an independent first-principles prediction or derived result. No equations, self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The assumption that the rewards faithfully capture biology is presented as a modeling choice, not a claim that reduces to its own inputs by construction. The derivation chain remains self-contained against the external reward benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the approach relies on the pre-existing CellFlux model and standard RL optimization; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5455 in / 1050 out tokens · 38764 ms · 2026-05-15T01:01:16.403684+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 9 internal anchors

  1. [1]

    Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder

    Michael Bereket and Theofanis Karaletsos. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 1–12, 2023. 3

  2. [2]

    Training Diffusion Models with Reinforcement Learning

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023. 3

  3. [3]

    Phendiff: Revealing subtle phenotypes with diffusion models in real images

    Anis Bourou, Thomas Boyer, Marzieh Gheisari, K ´evin Daupin, V´eronique Dubreuil, Aur´elie De Thonel, Val´erie Mezger, and Auguste Genovesio. Phendiff: Revealing subtle phenotypes with diffusion models in real images. InMICCAI, 2024. 3, 7, 8

  4. [4]

    How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024

    Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B Burkhardt, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024. 1

  5. [5]

    Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026

    Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Juefei-Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, and Ji Hou. Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026. 3

  6. [6]

    High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010

    Peter D Caie, Rebecca E Walls, Alexandra Ingleston-Orme, Sandeep Daya, Tom Houslay, Rob Eagle, Mark E Roberts, and Neil O Carragher. High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010. 7, 15

  7. [7]

    Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023

    Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D Michael Ando, John Arevalo, Melissa Ben- nion, Nicolas Boisseau, Adriana Borowa, Justin D Boyd, Laurent Brino, et al. Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023. 1

  8. [8]

    Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

    Srinivas Niranj Chandrasekaran, Beth A Cimini, Amy Goodale, Lisa Miller, Maria Kost-Alimova, Nasim Jamali, John G Doench, Briana Fritchman, Adam Skepner, Michelle Melanson, et al. Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

  9. [9]

    Transdreamer: Reinforcement learning with transformer world models

    Chang Chen, Jaesik Yoon, Yi-Fu Wu, and Sungjin Ahn. Transdreamer: Reinforcement learning with transformer world models. InDeep RL Workshop NeurIPS 2021, 2021. 3

  10. [10]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021. 7

  11. [11]

    Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023

    Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023. 3

  12. [12]

    Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023

    Marta M Fay, Oren Kraus, Mason Victors, Lakshmanan Arumugam, Kamal Vuggumudi, John Urbanik, Kyle Hansen, Safiye Celik, Nico Cernek, Ganesh Jagannathan, et al. Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023. 1

  13. [13]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeurIPS, 2020. 1

  14. [14]

    Diffusion-based generation, optimization, and planning in 3d scenes

    Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, and Song-Chun Zhu. Diffusion-based generation, optimization, and planning in 3d scenes. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 16750–16761, June 2023. 3

  15. [15]

    Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024

    Albert Z Hung, Charles J Zhang, Jonathan Z Sexton, Matthew James O’Meara, and Joshua D Welch. Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024. 3 11 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

  16. [16]

    Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023

    Graham T Johnson, Eran Agmon, Matthew Akamatsu, Emma Lundberg, Blair Lyons, Wei Ouyang, Omar A Quintero-Carmona, Megan Riel-Mehan, Susanne Rafelski, and Rick Horwitz. Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023. 1

  17. [17]

    How far is video generation from world model: A physical law perspective

    Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How far is video generation from world model: A physical law perspective. InInternational Conference on Machine Learning, pages 28991–29017. PMLR, 2025. 3

  18. [18]

    Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023

    Alexis Lamiable, Tiphaine Champetier, Francesco Leonardi, Ethan Cohen, Peter Sommer, David Hardy, Nicolas Argy, Achille Massougbodji, Elaine Del Nery, Gilles Cottrell, et al. Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023. 3

  19. [19]

    MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

    Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Miles Yang, and Zhao Zhong. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025. 3

  20. [20]

    Let’s verify step by step

    Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe twelfth international conference on learning representations, 2023. 7

  21. [21]

    Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

    Han Lin, Abhay Zala, Jaemin Cho, and Mohit Bansal. Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning. InCOLM, 2024. 3

  22. [22]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. 1

  23. [23]

    Flow Matching Guide and Code

    Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code.arXiv preprint arXiv:2412.06264,

  24. [24]

    Flow-GRPO: Training Flow Matching Models via Online RL

    Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025. 2, 3

  25. [25]

    Improving Video Generation with Human Feedback

    Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, et al. Improving video generation with human feedback.arXiv preprint arXiv:2501.13918,

  26. [26]

    Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024

    Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh. Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024. 1

  27. [27]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. 1

  28. [28]

    Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012

    Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012. 15

  29. [29]

    Inference-time scaling for diffusion models beyond scaling denoising steps

    Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, et al. Inference-time scaling for diffusion models beyond scaling denoising steps. arXiv preprint arXiv:2501.09732, 2025. 2

  30. [30]

    Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

    Yutaka Matsuo, Yann LeCun, Maneesh Sahani, Doina Precup, David Silver, Masashi Sugiyama, Eiji Uchibe, and Jun Morimoto. Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

  31. [31]

    Towards world simulator: Crafting physical commonsense-based benchmark for video generation.arXiv preprint arXiv:2410.05363, 2024

    Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, and Ping Luo. Towards world simulator: Crafting physical commonsense-based benchmark for video generation.arXiv preprint arXiv:2410.05363, 2024. 3

  32. [32]

    Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022

    Marius Pachitariu and Carsen Stringer. Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022. 5

  33. [33]

    Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025

    Alessandro Palma, Fabian J Theis, and Mohammad Lotfollahi. Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025. 3, 7, 8

  34. [34]

    Rdpo: Real data preference optimization for physics consistency video generation, 2025

    Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, and Anxiang Zeng. Rdpo: Real data preference optimization for physics consistency video generation, 2025. 3

  35. [35]

    Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019

    Ladislav Ramp ´aˇsek, Daniel Hidru, Petr Smirnov, Benjamin Haibe-Kains, and Anna Goldenberg. Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019. 3

  36. [36]

    Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010

    Boris M Slepchenko and Leslie M Loew. Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010. 3 12 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

  37. [37]

    Quantitative cell biology with the virtual cell.Trends in cell biology, 2003

    Boris M Slepchenko, James C Schaff, Ian Macara, and Leslie M Loew. Quantitative cell biology with the virtual cell.Trends in cell biology, 2003. 1

  38. [38]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024. 7

  39. [39]

    Deep unsupervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015. 1

  40. [40]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS,

  41. [41]

    Walker and Jennifer Southgate

    Dawn C. Walker and Jennifer Southgate. The virtual cell—a candidate co-ordinator for ‘middle-out’ modelling of biological systems.Briefings in Bioinformatics, 10(4):450–461, 03 2009. 3

  42. [42]

    Daydreamer: World models for physical robot learning

    Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Karen Liu, Dana Kulic, and Jeff Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 2226–

  43. [43]

    PMLR, 14–18 Dec 2023. 3

  44. [44]

    Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation, 2024

    Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, and Yuxiao Dong. Visionreward: Fine-grained multi-dimensional human preference learning for image and video genera...

  45. [45]

    DanceGRPO: Unleashing GRPO on Visual Generation

    Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818,

  46. [46]

    Physcene: Physically interactable 3d scene synthe- sis for embodied ai

    Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthe- sis for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16262–16272, June 2024. 3

  47. [47]

    Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

    Lingchong You. Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

  48. [48]

    Cellflux: Simulating cellular morphology changes via flow matching

    Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, et al. Cellflux: Simulating cellular morphology changes via flow matching. arXiv preprint arXiv:2502.09775, 2025. 1, 3, 4, 5, 7, 8, 15, 17

  49. [49]

    DiffusionNFT: Online Diffusion Reinforcement with Forward Process

    Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process.arXiv preprint arXiv:2509.16117, 2025. 2, 3, 6, 7, 8, 14, 15

  50. [50]

    Compositional 3d-aware video generation with llm director

    Hanxin Zhu, Tianyu He, Anni Tang, Junliang Guo, Zhibo Chen, and Jiang Bian. Compositional 3d-aware video generation with llm director. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 131618–131644,

  51. [51]

    The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward functions

    3 13 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT A Algorithm of CellFluxRL We present the full training procedure ofCellFluxRLin Algorithm 1. The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward...