Controllable Histopathology Image Synthesis with Training-free Structural Initialization and Textural Modulation
Pith reviewed 2026-07-01 06:42 UTC · model grok-4.3
The pith
A training-free plug-in lets any pretrained diffusion model generate histopathology images that follow given structural masks while keeping the style of reference tissue.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing the starting Gaussian noise with a frequency-domain mixture that takes its phase from the structural mask and its amplitude from noise, and by then performing adaptive modulation of wavelet sub-bands at multiple decomposition levels throughout the denoising trajectory, a diffusion model pretrained solely on unlabeled histopathology images can be made to produce images whose spatial layout matches an arbitrary input mask while retaining the textural statistics of a chosen reference image.
What carries the argument
structural initialization by phase-amplitude fusion in the frequency domain together with adaptive multi-scale wavelet-band modulation during reverse diffusion
If this is right
- Any diffusion model trained on unlabeled histopathology slides can immediately be used for paired data generation without collecting new annotations.
- The same mask can be paired with different reference images to vary tissue appearance while keeping layout fixed.
- Downstream segmentation accuracy rises when models are trained on the synthetic pairs produced by this process.
- The approach requires no gradient updates or fine-tuning at inference time.
Where Pith is reading between the lines
- The same initialization and modulation steps could be tested on diffusion models for other medical imaging modalities such as radiology or microscopy.
- If the phase-fusion step generalizes, the method might reduce the need for large annotated datasets in any domain where structural masks are cheaper to obtain than full images.
- A natural extension would be to replace the supplied mask with an automatically extracted one from a different modality to test cross-modal structural transfer.
Load-bearing premise
The assumption that phase information taken from the mask and inserted into the initial noise, plus later wavelet-level texture adjustments, will be enough to force structural fidelity without any training on paired examples.
What would settle it
Generate a set of images with CHIS using masks whose structures are known to be absent from the pretraining distribution; if independent segmentation networks trained on real data achieve no higher Dice scores on the CHIS outputs than on standard diffusion samples, the structural-control claim is falsified.
Figures
read the original abstract
Deep learning has demonstrated remarkable success in high-throughput histopathology image analysis. However, the performance of learning-based models critically depends on the quality and size of annotations by expert pathologists, which is a resource-intensive and time-consuming process. To address the limitations of data scarcity and annotation burden, several methods have been proposed to synthesize paired histopathology data. Nevertheless, these frameworks typically still require annotation data, albeit in reduced quantities, to impose structural constraints during training. In this work, we present CHIS, a plug-in framework that guides the sampling trajectory of a pretrained diffusion model through two key stages: structural initialization at the start and textural modulation during generation. The initial noise state is refined by fusing the phase information from a prior mask with the amplitude of Gaussian noise in the frequency domain, yielding a structurally informed starting point. During the reverse diffusion process, we adaptively modulate both coarse-grained and fine-grained textures at different wavelet decomposition levels. This enables a diffusion model pretrained solely on unlabeled images to generate outputs that align with prior structural masks while preserving the reference tissue style. We conducted extensive experiments demonstrating the superiority of CHIS in generation fidelity and its substantial benefits for downstream segmentation tasks. Code is available at https://github.com/IBIL-Code/CHIS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CHIS, a training-free plug-in framework for controllable synthesis of histopathology images using a diffusion model pretrained on unlabeled data. The method involves structural initialization by fusing the phase spectrum from a structural mask with the amplitude spectrum of Gaussian noise in the frequency domain at the initial noise state, followed by adaptive textural modulation at different wavelet decomposition levels during the reverse diffusion process. This is claimed to produce images that align with the input structural masks while preserving reference tissue style, with demonstrated superiority in fidelity and benefits for downstream segmentation tasks.
Significance. If the claims hold, the approach would be significant as it enables controllable generation without requiring paired mask-image training data, addressing data scarcity and annotation burden in histopathology analysis. The training-free nature and use of pretrained models on unlabeled images could facilitate broader adoption in medical imaging.
major comments (2)
- [Abstract] Abstract: The abstract asserts that 'extensive experiments' demonstrate superiority in generation fidelity and substantial benefits for downstream segmentation tasks, yet supplies no quantitative metrics, ablation results, or experimental protocol. This absence is load-bearing because the central claim of effective structural control cannot be verified.
- [Method] Method description: No analytic bound or ablation is provided to show that phase-amplitude fusion at initialization plus wavelet-band modulation during reverse diffusion suffices to enforce mask alignment throughout the full denoising trajectory. Since the underlying model was trained only on unlabeled images, this directly tests whether injected structure survives without drift or is supplied by the reference style image.
minor comments (2)
- [Abstract] The code repository link is provided, supporting reproducibility.
- Notation for frequency-domain operations and specific wavelet levels could be clarified for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below, clarifying the manuscript content and proposing targeted revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract asserts that 'extensive experiments' demonstrate superiority in generation fidelity and substantial benefits for downstream segmentation tasks, yet supplies no quantitative metrics, ablation results, or experimental protocol. This absence is load-bearing because the central claim of effective structural control cannot be verified.
Authors: We agree the abstract is too high-level. The full manuscript (Section 4) reports quantitative metrics including FID, LPIPS, and downstream Dice/IoU improvements, along with the evaluation protocol on public histopathology datasets. We will revise the abstract to include key numerical results and a concise protocol summary. revision: yes
-
Referee: [Method] Method description: No analytic bound or ablation is provided to show that phase-amplitude fusion at initialization plus wavelet-band modulation during reverse diffusion suffices to enforce mask alignment throughout the full denoising trajectory. Since the underlying model was trained only on unlabeled images, this directly tests whether injected structure survives without drift or is supplied by the reference style image.
Authors: The paper supplies empirical evidence via alignment metrics, visual trajectory inspections, and ablations isolating initialization versus modulation, showing structure persists independently of the style reference. No analytic bound is derived, as the work is empirical; we can add further trajectory ablations in revision to strengthen this point. revision: partial
Circularity Check
No circularity: method is a proposed heuristic without definitional reduction or self-citation chains
full rationale
The paper describes a training-free plug-in technique (phase-amplitude fusion at initialization plus wavelet-band modulation) applied to an externally pretrained diffusion model. No equations, fitted parameters, or predictions are presented that reduce to the inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems. The central claim is an empirical assertion about the sufficiency of the described operations, which remains independent of the target outputs and does not rename or smuggle prior results. This is the common case of a self-contained engineering proposal.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Bansal, A., Chu, H.M., Schwarzschild, A., Sengupta, S., Goldblum, M., Geiping, J., Goldstein, T.: Universal guidance for diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 843–852 (2023)
2023
-
[2]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Bhosale, M., Wasi, A., Zhai, Y., Tian, Y., Border, S., Xi, N., Sarder, P., Yuan, J., Doermann, D., Gong, X.: Pathdiff: Histopathology image synthesis with un- paired text and mask conditions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 22415–22424 (2025)
2025
-
[3]
In: 2022 IEEE 19th International sympo- sium on biomedical imaging (ISBI)
Butte, S., Wang, H., Xian, M., Vakanski, A.: Sharp-gan: Sharpness loss regularized gan for histopathology image synthesis. In: 2022 IEEE 19th International sympo- sium on biomedical imaging (ISBI). pp. 1–5. IEEE (2022)
2022
-
[4]
Improved Regularization of Convolutional Neural Networks with Cutout
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural net- works with cutout. arXiv preprint arXiv:1708.04552 (2017),https://arxiv.org/ abs/1708.04552
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Nature medicine pp
Ding, T., Wagner, S.J., Song, A.H., Chen, R.J., Lu, M.Y., Zhang, A., Vaidya, A.J., Jaume,G.,Shaban,M.,Kim,A.,etal.:Amultimodalwhole-slidefoundationmodel for pathology. Nature medicine pp. 1–13 (2025)
2025
-
[6]
In: European Congress on Digital Pathology
Gamper, J., Koohbanani, N.A., Benet, K., Khuram, A., Rajpoot, N.: Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classifica- tion. In: European Congress on Digital Pathology. pp. 11–19. Springer (2019)
2019
-
[7]
IEEE Transactions on Medical Imaging 39(12), 4124–4136 (2020)
Graham, S., Epstein, D., Rajpoot, N.: Dense steerable filter cnns for exploiting rotational symmetry in histology images. IEEE Transactions on Medical Imaging 39(12), 4124–4136 (2020)
2020
-
[8]
Medical image analysis58, 101563 (2019)
Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.: Hover-net: Simultaneous segmentation and classification of nuclei in multi- tissue histology images. Medical image analysis58, 101563 (2019)
2019
-
[9]
Signal processing: Image communication13(3), 171–181 (1998)
Huang, W.C., Chang, L.W.: Predictive subband image coding with wavelet trans- form. Signal processing: Image communication13(3), 171–181 (1998)
1998
-
[10]
Nature methods18(2), 203–211 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)
2021
-
[11]
Journal of pathology informatics 7(1), 29 (2016)
Janowczyk,A.,Madabhushi,A.:Deeplearningfordigitalpathologyimageanalysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics 7(1), 29 (2016)
2016
-
[12]
IEEE transactions on medical imaging43(3), 980–993 (2023)
Li, Y., Shao, H.C., Liang, X., Chen, L., Li, R., Jiang, S., Wang, J., Zhang, Y.: Zero-shot medical image translation via frequency-guided diffusion models. IEEE transactions on medical imaging43(3), 980–993 (2023)
2023
-
[13]
IEEE Transactions on Medical Imaging42(12), 3524–3539 (2023)
Özbey, M., Dalmaz, O., Dar, S.U., Bedel, H.A., Özturk, Ş., Güngör, A., Cukur, T.: Unsupervised medical image translation with adversarial diffusion models. IEEE Transactions on Medical Imaging42(12), 3524–3539 (2023)
2023
-
[14]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021) 10 Qiu et al
2021
-
[15]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
2022
-
[16]
Advances in neural information processing systems29(2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Advances in neural information processing systems29(2016)
2016
-
[17]
Nature Reviews Bioengineering1(12), 930–949 (2023)
Song, A.H., Jaume, G., Williamson, D.F., Lu, M.Y., Vaidya, A., Miller, T.R., Mah- mood, F.: Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering1(12), 930–949 (2023)
2023
-
[18]
net/forum?id=PxTIG12RRHS
Song,J.,Meng,C.,Ermon,S.:Denoisingdiffusionimplicitmodels.In:International Conference on Learning Representations (ICLR) (2020),https://openreview. net/forum?id=PxTIG12RRHS
2020
-
[19]
Patterns1(6) (2020)
Tschuchnig,M.E.,Oostingh,G.J.,Gadermayr,M.:Generativeadversarialnetworks in digital pathology: a survey on trends and future potential. Patterns1(6) (2020)
2020
-
[20]
Verma, R., Kumar, N., Patil, A., Kurian, N.C., Rane, S., Graham, S., Vu, Q.D., Zwager, M., Raza, S.E.A., Rajpoot, N., Wu, X., Chen, H., Huang, Y., Wang, L., Jung, H., Brown, G.T., Liu, Y., Liu, S., Jahromi, S.A.F., Khani, A.A., Montahaei, E., Baghshah, M.S., Behroozi, H., Semkin, P., Rassadin, A., Dutande, P., Lodaya, R., Baid, U., Baheti, B., Talbar, S.,...
-
[21]
arXiv preprint arXiv:2207.00050 (2022)
Wang, W., Bao, J., Zhou, W., Chen, D., Chen, D., Yuan, L., Li, H.: Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050 (2022)
-
[22]
Knudsen, Tahsin Kurc, Rajarsi R
Yellapragada, S., Graikos, A., Li, Z., Triaridis, K., Belagali, V., Kapse, S., Nandi, T.N., Madduri, R.K., Prasanna, P., Kurc, T., et al.: Pixcell: A generative foun- dation model for digital histopathology images. arXiv preprint arXiv:2506.05127 (2025)
-
[23]
In: International Conference on Med- ical Image Computing and Computer-Assisted Intervention
Yu, X., Li, G., Lou, W., Liu, S., Wan, X., Chen, Y., Li, H.: Diffusion-based data augmentation for nuclei image segmentation. In: International Conference on Med- ical Image Computing and Computer-Assisted Intervention. pp. 592–602. Springer (2023)
2023
-
[24]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 6023–6032 (2019).https://doi.org/10.1109/ICCV.2019.00612
-
[25]
Zeng, Y., Ochoa, C., Zhou, M., Patel, V.M., Guizilini, V., McAllister, R.: Neuralre- master: Phase-preserving diffusion for structure-aligned generation. arXiv preprint arXiv:2512.05106 (2025)
-
[26]
In: Proceedings of the Computer Vision and Pattern Recogni- tion Conference
Zhang, Y., Liu, Z., Li, Z., Li, Z., Clark, J.J., Si, X.: Decoupling training-free guided diffusion by admm. In: Proceedings of the Computer Vision and Pattern Recogni- tion Conference. pp. 23292–23302 (2025)
2025
-
[27]
Medical image analysis p
Zhu, P., Liu, C., Fu, Y., Chen, N., Qiu, A.: Cycle-conditional diffusion model for noise correction of diffusion-weighted images using unpaired data. Medical image analysis p. 103579 (2025)
2025
-
[28]
arXiv preprint arXiv:2508.06625 (2025)
Zou, S., Huang, Y., Yi, R., Zhu, C., Xu, K.: Cyclediff: Cycle diffusion models for unpaired image-to-image translation. arXiv preprint arXiv:2508.06625 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.