Stitching and dimensionality effects on large artificially generated volume datasets
Pith reviewed 2026-06-26 17:55 UTC · model grok-4.3
The pith
Stitching artifacts in cycleGAN-generated cryo-EM volumes evade FID detection yet reduce downstream mitochondria segmentation accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When large cryo-EM volumes are assembled from cycleGAN patches, FID scores overlook subtle stitching artifacts that nevertheless lower accuracy on mitochondria segmentation; artifact-free 3D stitching yields only marginal downstream gains over 2D that barely justify the extra cost, while 2D models train more stably from larger batch sizes and orthogonal ensembling improves only low-quality outputs.
What carries the argument
Three stitching approaches combined with 2D versus 3D patch dimensionality inside cycleGAN models, evaluated by FID and by accuracy on a mitochondria segmentation downstream task.
If this is right
- Subtle stitching artifacts can degrade segmentation even when FID reports good perceptual quality.
- Artifact-free 3D stitching produces only marginal segmentation gains over 2D that may not offset higher computational cost.
- 2D models train more stably because they allow larger batch sizes.
- Ensembling predictions from three orthogonal directions improves low-quality stitched volumes but adds no value to already high-quality outputs.
Where Pith is reading between the lines
- Developers of generative volume models may need task-specific metrics that directly measure border consistency rather than relying solely on FID.
- Practical pipelines could prioritize improved 2D stitching techniques over switching to 3D if the marginal accuracy gain remains small.
- The observed mismatch between perceptual and task metrics could appear in other biomedical generation settings such as denoising or super-resolution of volumes.
Load-bearing premise
That the mitochondria segmentation task and the chosen cryo-EM datasets are representative of how stitching artifacts will behave in other generative models and other scientific volume tasks.
What would settle it
Running the same cycleGAN training, stitching variants, and FID-plus-segmentation evaluation on a different volume dataset or a different downstream task such as nuclei counting and checking whether FID still fails to predict the segmentation drop.
Figures
read the original abstract
Generating large images via deep learning requires patching input data to accommodate hardware memory limitations, then assembling output patches, a process that can introduce stitching artifacts when neighboring patches do not align at borders. While these artifacts are known to affect segmentation tasks, their impact on generative models for style-transfer remains poorly understood. We investigated three stitching approaches and two patch dimensionalities (2D vs 3D) using cycleGAN models trained on cryo-electron microscopy datasets. We evaluated both perceptual quality and performance on downstream mitochondria segmentation. Our key findings reveal that: (1) FID scores fail to detect subtle stitching artifacts that significantly impact downstream segmentation performance, (2) 3D models with artifact-free stitching marginally outperform 2D models on downstream tasks, though the improvement barely justifies the computational cost, and (3) 2D models train more stably due to larger batch sizes. Additionally, we demonstrate that ensembling predictions from three orthogonal directions can improve low-quality volumes but provides no benefit for high-quality outputs. These results demonstrate that maximizing generative model performance on large scientific datasets requires careful consideration and mitigation of stitching artifacts, and that perceptual metrics alone are insufficient for evaluating domain adaptation quality in biomedical imaging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines three stitching approaches and 2D vs. 3D patch dimensionalities when using cycleGAN for style transfer on cryo-EM volumes to generate large artificially stitched datasets. It reports that FID scores miss subtle border artifacts that degrade downstream mitochondria segmentation performance, that artifact-free 3D stitching yields only marginal gains over 2D at high computational cost, that 2D training is more stable due to batch size, and that ensembling orthogonal predictions improves only low-quality outputs.
Significance. If the empirical discrepancy between FID and segmentation holds, the work supplies concrete evidence that standard perceptual metrics are inadequate for evaluating generative models on scientific volumes and offers practical guidance on stitching and dimensionality trade-offs for large biomedical datasets.
major comments (2)
- [Abstract / Experiments] Abstract and experiments section: the central claim that 'perceptual metrics alone are insufficient' and that stitching artifacts 'significantly impact downstream segmentation' rests on a single cycleGAN + mitochondria segmentation task on cryo-EM data; no ablation on other generative architectures (diffusion, etc.) or downstream tasks is described, so the asserted generality to 'large scientific volume datasets' is not load-bearing.
- [Results] Results on 3D vs 2D: the statement that 3D 'marginally outperform[s] 2D ... though the improvement barely justifies the computational cost' requires explicit quantitative comparison (performance delta vs. training/inference FLOPs or wall-clock time); without those numbers the cost-benefit conclusion cannot be assessed.
minor comments (1)
- [Abstract] The abstract states 'we investigated three stitching approaches' but does not name them; a brief enumeration would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and experiments section: the central claim that 'perceptual metrics alone are insufficient' and that stitching artifacts 'significantly impact downstream segmentation' rests on a single cycleGAN + mitochondria segmentation task on cryo-EM data; no ablation on other generative architectures (diffusion, etc.) or downstream tasks is described, so the asserted generality to 'large scientific volume datasets' is not load-bearing.
Authors: We agree that the experiments are limited to cycleGAN and a single downstream segmentation task on cryo-EM data. The study was designed to examine stitching effects in this representative setting for large biomedical volumes. To address the concern about overgeneralization, we will revise the abstract and discussion sections to clarify the scope of the claims and avoid implying broad applicability to all generative architectures or tasks. revision: yes
-
Referee: [Results] Results on 3D vs 2D: the statement that 3D 'marginally outperform[s] 2D ... though the improvement barely justifies the computational cost' requires explicit quantitative comparison (performance delta vs. training/inference FLOPs or wall-clock time); without those numbers the cost-benefit conclusion cannot be assessed.
Authors: We agree that the cost-benefit statement requires supporting quantitative data. In the revised manuscript we will add explicit comparisons of segmentation performance deltas against training and inference costs measured in FLOPs and wall-clock time. revision: yes
Circularity Check
No circularity; purely empirical comparison
full rationale
The paper conducts an experimental study training cycleGAN models on cryo-EM volumes, testing three stitching approaches and 2D vs 3D patch dimensionalities, then measuring FID scores and downstream mitochondria segmentation performance. No derivation chain, fitted parameters renamed as predictions, self-citation load-bearing premises, or ansatzes appear in the described work. All reported findings (FID failing to detect artifacts, marginal 3D gains, etc.) rest on direct empirical measurements rather than any reduction to prior inputs by construction. The study is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ledig, C.et al. Photo-realistic single image super-resolution using a generative adver- sarial networkinProceedings of the IEEE conference on computer vision and pattern recognition(2017), 4681–4690
2017
-
[2]
Saharia, C.et al.Image super-resolution via iterative refinement.IEEE transactions on pattern analysis and machine intelligence45,4713–4726 (2022)
2022
-
[3]
Z., Sit, M
Demiray, B. Z., Sit, M. & Demir, I. D-SRGAN: DEM super-resolution with generative adversarial networks.SN Computer Science2,48 (2021)
2021
-
[4]
Jansche, A.et al.Deep learning-based image super resolution methods in microscopy–a review.Methods in Microscopy2,235–275 (2025)
2025
-
[5]
& Efros, A
Isola, P., Zhu, J.-Y ., Zhou, T. & Efros, A. A.Image-to-image translation with conditional adversarial networksinProceedings of the IEEE conference on computer vision and pat- tern recognition(2017), 1125–1134. 19
2017
-
[6]
& Efros, A
Zhu, J.-Y ., Park, T., Isola, P. & Efros, A. A.Unpaired image-to-image translation using cycle-consistent adversarial networksinProceedings of the IEEE international conference on computer vision(2017), 2223–2232
2017
-
[7]
Lauenburg, L.et al.Instance segmentation of unlabeled modalities via cyclic segmentation gan.arXiv preprint arXiv:2204.03082(2022)
arXiv 2022
-
[8]
P., Fuller, C
Kieselmann, J. P., Fuller, C. D., Gurney-Champion, O. J. & Oelfke, U. Cross-modality deep learning: contouring of MRI data from annotated CT data only.Medical physics48, 1673–1684 (2021)
2021
-
[9]
Zhang, Z., Yang, L. & Zheng, Y .Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial networkinProceedings of the IEEE conference on computer vision and pattern Recognition(2018), 9242–9251
2018
-
[10]
& Litjens, G
De Bel, T., Bokhorst, J.-M., van der Laak, J. & Litjens, G. Residual cyclegan for robust do- main transformation of histopathological tissue slides.Medical Image Analysis70,102004 (2021)
2021
-
[11]
Scientific Reports13,7303 (2023)
Khader, F.et al.Denoising diffusion probabilistic models for 3D medical image generation. Scientific Reports13,7303 (2023)
2023
-
[12]
Thambawita, V .et al.SinGAN-Seg: Synthetic training data generation for medical image segmentation.PloS one17,e0267976 (2022)
2022
-
[13]
& Stegmaier, J
Eschweiler, D., Rethwisch, M., Jarchow, M., Koppers, S. & Stegmaier, J. 3D fluorescence microscopy data synthesis for segmentation and benchmarking.Plos one16,e0260509 (2021)
2021
-
[14]
& Grauman, K.Fine-grained visual comparisons with local learninginProceedings of the IEEE conference on computer vision and pattern recognition(2014), 192–199
Yu, A. & Grauman, K.Fine-grained visual comparisons with local learninginProceedings of the IEEE conference on computer vision and pattern recognition(2014), 192–199
2014
-
[15]
& Grauman, K.Semantic jitter: Dense supervision for visual comparisons via syn- thetic imagesinProceedings of the IEEE International Conference on Computer Vision (2017), 5570–5579
Yu, A. & Grauman, K.Semantic jitter: Dense supervision for visual comparisons via syn- thetic imagesinProceedings of the IEEE International Conference on Computer Vision (2017), 5570–5579
2017
-
[16]
Russakovsky, O.et al.ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115,211–252 (2015)
2015
-
[17]
& Tang, X.Deep Learning Face Attributes in the WildinPro- ceedings of International Conference on Computer Vision (ICCV)(Dec
Liu, Z., Luo, P., Wang, X. & Tang, X.Deep Learning Face Attributes in the WildinPro- ceedings of International Conference on Computer Vision (ICCV)(Dec. 2015)
2015
-
[18]
& Tomancak, P
Preibisch, S., Saalfeld, S. & Tomancak, P. Globally optimal stitching of tiled 3D micro- scopic image acquisitions.Bioinformatics25,1463–1465 (2009)
2009
-
[19]
& Iannello, G
Bria, A. & Iannello, G. TeraStitcher-a tool for fast automatic 3D-stitching of teravoxel- sized microscopy images.BMC bioinformatics13,316 (2012)
2012
-
[20]
A., Panchumarthy, R., Thakur, S
Reina, G. A., Panchumarthy, R., Thakur, S. P., Bastidas, A. & Bakas, S. Systematic evalu- ation of image tiling adverse effects on deep learning semantic segmentation.Frontiers in neuroscience14,65 (2020)
2020
-
[21]
Rumberger, J. L.et al. How shift equivariance impacts metric learning for instance seg- mentationinProceedings of the IEEE/CVF International Conference on Computer Vision (2021), 7128–7136
2021
-
[22]
Buglakova, E.et al.Tiling artifacts and trade-offs of feature normalization in the segmen- tation of large biological images.arXiv preprint arXiv:2503.19545(2025). 20
arXiv 2025
-
[23]
& Sun, J.Deep residual learning for image recognitionin Proceedings of the IEEE conference on computer vision and pattern recognition(2016), 770–778
He, K., Zhang, X., Ren, S. & Sun, J.Deep residual learning for image recognitionin Proceedings of the IEEE conference on computer vision and pattern recognition(2016), 770–778
2016
-
[24]
& Bajcsy, P
Possolo, M. & Bajcsy, P. Exact tile-based segmentation inference for images larger than gpu memory.Journal of Research of the National Institute of Standards and Technology 126,126009 (2021)
2021
-
[25]
Wolny, A.et al.Accurate and versatile 3D segmentation of plant tissues at cellular resolu- tion.eLife9(eds Hardtke, C. S., Bergmann, D. C., Bergmann, D. C. & Graeff, M.) e57613. ISSN: 2050-084X.https://doi.org/10.7554/eLife.57613(July 2020)
-
[26]
Computational methods for stitching, alignment, and artifact correction of se- rial section data.Methods in Cell Biology152,261–276 (2019)
Saalfeld, S. Computational methods for stitching, alignment, and artifact correction of se- rial section data.Methods in Cell Biology152,261–276 (2019)
2019
-
[27]
& Brox, T.U-net: Convolutional networks for biomedical im- age segmentationinInternational Conference on Medical image computing and computer- assisted intervention(2015), 234–241
Ronneberger, O., Fischer, P. & Brox, T.U-net: Convolutional networks for biomedical im- age segmentationinInternational Conference on Medical image computing and computer- assisted intervention(2015), 234–241
2015
-
[28]
Kayhan, O. S. & Gemert, J. C. v.On translation invariance in cnns: Convolutional lay- ers can exploit absolute spatial locationinProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2020), 14274–14285
2020
-
[29]
S., Brox, T
C ¸ ic ¸ek,¨O., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O.3D U-Net: learn- ing dense volumetric segmentation from sparse annotationinInternational conference on medical image computing and computer-assisted intervention(2016), 424–432
2016
-
[30]
& Hochreiter, S
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30(2017)
2017
-
[31]
Wei, D.et al. Mitoem dataset: Large-scale 3d mitochondria instance segmentation from em imagesinInternational Conference on Medical Image Computing and Computer-Assisted Intervention(2020), 66–76
2020
-
[32]
Heinrich, L.et al.Whole-cell organelle segmentation in volume electron microscopy.Na- ture599,141–146 (2021)
2021
-
[33]
T.et al.Isotropic 3D electron microscopy reference data of wild-type, immor- talized T-Cells (jrc jurkat-1).https : / / janelia
Group, F.-S. T.et al.Isotropic 3D electron microscopy reference data of wild-type, immor- talized T-Cells (jrc jurkat-1).https : / / janelia . figshare . com / articles / dataset/Isotropic_3D_electron_microscopy_reference_data_of_ wild - type _ immortalized _ T - Cells _ jrc _ jurkat - 1 _ /13114259(Nov. 2020)
2020
-
[34]
Xu, C. S.et al.Isotropic 3D electron microscopy reference data of wild-type, interphase HeLa cell (jrc hela-1).https://janelia.figshare.com/articles/dataset/ Isotropic _ 3D _ electron _ microscopy _ reference _ data _ of _ wild - type_interphase_HeLa_cell_jrc_hela-1_/13123415(Nov. 2020)
2020
-
[35]
T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-2).https : / / janelia
Group, F.-S. T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-2).https : / / janelia . figshare . com / articles / dataset/Isotropic_3D_electron_microscopy_reference_data_of_ wild-type_interphase_HeLa_cell_jrc_hela-2_/13114211(Nov. 2020)
2020
-
[36]
T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-3).https : / / janelia
Group, F.-S. T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-3).https : / / janelia . figshare . com / articles / dataset/Isotropic_3D_electron_microscopy_reference_data_of_ wild-type_interphase_HeLa_cell_jrc_hela-3_/13114244(Nov. 2020). 21
2020
-
[37]
Jin, X., Qi, Y . & Wu, S. Cyclegan face-off.arXiv preprint arXiv:1712.03451(2017)
Pith/arXiv arXiv 2017
-
[38]
Version 0.3.0
Seitzer, M.pytorch-fid: FID Score for PyTorchhttps://github.com/mseitzer/ pytorch-fid. Version 0.3.0. Aug. 2020. 22
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.