pith. machine review for the scientific record. sign in

arxiv: 2604.27889 · v1 · submitted 2026-04-30 · 💻 cs.CV

Recognition: unknown

Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:44 UTC · model grok-4.3

classification 💻 cs.CV
keywords diffusion modelssemantic segmentationchange detectionremote sensingsatellite imageryself-supervised learningnoise schedulingend-to-end prediction
0
0 comments X

The pith

A diffusion model can directly predict semantic segmentation and change maps from satellite images by conditioning its denoising on task-specific noise schedules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that the denoising process from diffusion models can be repurposed for end-to-end discriminative learning in remote sensing, replacing the usual generative sampling with direct map prediction via task-specific noise schedules and timestep conditioning. This would matter because standard models for semantic segmentation and change detection often need heavy pretraining, struggle with temporal inconsistencies in satellite data, and lack interpretability, while this approach starts from self-supervised denoising pretraining and then fine-tunes for robustness on both tasks with one shared backbone. If the claim holds, remote sensing pipelines could gain a unified, faster framework that avoids the computational cost of traditional diffusion sampling at inference time.

Core claim

Noise2Map is a unified diffusion-based framework that repurposes the denoising process for fast, end-to-end discriminative learning by directly predicting semantic or change maps using task-specific noise schedules and timestep conditioning. The model is pretrained via self-supervised denoising and fine-tuned with supervision on a shared backbone that supports both semantic segmentation and change detection through separate schedulers. Unlike prior diffusion work limited to generation or feature extraction, this avoids costly iterative sampling. On the SpaceNet7, WHU, and xView2 datasets it achieves the top average rank among seven models for segmentation and the top rank for change detec

What carries the argument

The task-specific noise scheduler combined with timestep conditioning inside the diffusion denoising network, which converts the generative process into direct prediction of segmentation or change maps from noisy inputs.

If this is right

  • The shared backbone allows a single trained model to handle both semantic segmentation and change detection without separate architectures.
  • Self-supervised denoising pretraining followed by fine-tuning improves robustness on remote sensing data that often has limited labels.
  • Avoiding full diffusion sampling at inference reduces computation, making the approach suitable for processing large volumes of satellite imagery.
  • Ablation results indicate the method remains stable across different choices of noise schedulers and timestep controls during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditioning trick on noise schedules might transfer to other dense prediction tasks such as depth estimation or instance segmentation where generative priors could help.
  • In practice this could support faster disaster response mapping because the model produces outputs in one forward pass rather than iterative refinement.
  • Interpretability might arise from inspecting intermediate denoising steps to see how the model resolves ambiguous regions in satellite scenes.

Load-bearing premise

Repurposing the denoising process with task-specific noise schedules and timestep conditioning enables effective end-to-end discriminative learning for semantic segmentation and change detection without needing costly sampling procedures.

What would settle it

If the model is evaluated on the xView2 wildfire damage dataset and its average F1 score for change detection falls below the top baseline instead of ranking first, the performance claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.27889 by Ali Shibli, Andrea Nascetti, Yifang Ban.

Figure 1
Figure 1. Figure 1: Noise2Map overview. (1) Self-supervised pretraining: a denoising attention U-Net is trained on 10k unlabeled satellite view at source ↗
Figure 2
Figure 2. Figure 2: Progressive noising-denoising of semantic segmenta view at source ↗
Figure 3
Figure 3. Figure 3: Change detection via progressive noising–denoising view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of change detection and semantic segmentation across all evaluated models. Our model view at source ↗
Figure 6
Figure 6. Figure 6: Evolution of F1 score during the reverse diffusion pro view at source ↗
Figure 5
Figure 5. Figure 5: Progression of predicted masks with Noise2Map over view at source ↗
read the original abstract

Semantic segmentation and change detection are two fundamental challenges in remote sensing, requiring models to capture either spatial semantics or temporal differences from satellite imagery. Existing deep learning models often struggle with temporal inconsistencies or in capturing fine-grained spatial structures, require extensive pretraining, and offer limited interpretability - especially in real-world remote sensing scenarios. Recent advances in diffusion models show that Gaussian noise can be systematically leveraged to learn expressive data representations through denoising. Motivated by this, we investigate whether the noise process in diffusion models can be effectively utilized for discriminative tasks. We propose Noise2Map, a unified diffusion-based framework that repurposes the denoising process for fast, end-to-end discriminative learning. Unlike prior work that uses diffusion only for generation or feature extraction, Noise2Map directly predicts semantic or change maps using task-specific noise schedules and timestep conditioning, avoiding the costly sampling procedures of traditional diffusion models. The model is pretrained via self-supervised denoising and fine-tuned with supervision, enabling both interpretability and robustness. Our architecture supports both tasks (SS and CD) through a shared backbone and task-specific noise schedulers. Extensive evaluations on the SpaceNet7, WHU, and xView2 buildings damaged by wildfires datasets demonstrate that Noise2Map ranks on average 1st among seven models on semantic segmentation and 1st on change detection by a cross-dataset rank metric (average F1 primary, IoU tie-break). Ablation studies highlight the robustness of our model against different training noise schedulers and timestep control in the diffusion process, as well as the ability of the model to perform multi-task learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Noise2Map, a unified diffusion-based framework for semantic segmentation and change detection in remote sensing imagery. It repurposes the denoising process via task-specific noise schedules and timestep conditioning to enable direct, end-to-end prediction of semantic or change maps from a shared backbone, with self-supervised denoising pretraining followed by supervised fine-tuning. The model performs single-pass inference at a chosen timestep. Extensive evaluations on SpaceNet7, WHU, and xView2 datasets claim that Noise2Map achieves the highest average rank (by F1 primary, IoU tie-break) among seven models for both tasks, with ablations demonstrating robustness to scheduler variations.

Significance. If the central claims hold after addressing the isolation of components, the work could demonstrate a practical way to adapt diffusion models for efficient discriminative tasks in remote sensing, potentially improving robustness and interpretability while avoiding iterative sampling. The use of multiple public datasets, a cross-dataset ranking metric, and multi-task support via shared backbone are positive aspects. However, the significance is currently limited by uncertainty over whether the reported gains derive specifically from the diffusion repurposing rather than backbone capacity or pretraining.

major comments (2)
  1. [Ablation studies] Ablation studies section: The reported ablations examine robustness across different noise schedulers and timestep controls, yet omit a control experiment that trains the identical backbone architecture end-to-end on clean images using standard supervised fine-tuning without timestep conditioning or noise addition. This control is load-bearing for the headline ranking claim, as it is needed to determine whether the F1/IoU advantages on SpaceNet7, WHU, and xView2 arise from the task-specific noise schedules and timestep conditioning or from other factors such as model capacity and self-supervised pretraining.
  2. [Experimental results] Experimental results and comparisons: The cross-dataset rank metric establishing Noise2Map as 1st among seven models on both tasks does not include error bars, statistical significance testing, or explicit confirmation that the baseline models received equivalent pretraining and data augmentation. Without these, the ranking's robustness cannot be fully assessed, particularly given the claim of superior performance on wildfire-damaged building datasets.
minor comments (2)
  1. [Abstract and Method] The abstract states that the model 'avoids the costly sampling procedures' via single-pass inference, but the method section should more explicitly describe the chosen inference timestep and how it is selected during fine-tuning versus testing.
  2. [Results tables] Tables reporting F1 and IoU scores should include the exact names and references for the seven compared models, along with any notes on whether they were re-implemented or used off-the-shelf.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful for the referee's insightful comments, which have helped us identify areas to strengthen our manuscript. We provide point-by-point responses to the major comments and commit to revisions that address the concerns regarding ablation controls and experimental rigor.

read point-by-point responses
  1. Referee: Ablation studies section: The reported ablations examine robustness across different noise schedulers and timestep controls, yet omit a control experiment that trains the identical backbone architecture end-to-end on clean images using standard supervised fine-tuning without timestep conditioning or noise addition. This control is load-bearing for the headline ranking claim, as it is needed to determine whether the F1/IoU advantages on SpaceNet7, WHU, and xView2 arise from the task-specific noise schedules and timestep conditioning or from other factors such as model capacity and self-supervised pretraining.

    Authors: We thank the referee for highlighting this important control. Our ablations demonstrate the model's robustness to scheduler variations within the diffusion paradigm, supporting the utility of task-specific noise schedules. However, to more rigorously isolate the contribution of the diffusion-based components (noise addition and timestep conditioning) from the backbone capacity and pretraining strategy, we will include the suggested control experiment in the revised version. Specifically, we will train the identical backbone architecture end-to-end using standard supervised learning on clean images without noise or timestep conditioning, and compare its performance to Noise2Map on the same datasets. This addition will strengthen the evidence that the performance advantages derive from our proposed repurposing of the diffusion process. revision: yes

  2. Referee: Experimental results and comparisons: The cross-dataset rank metric establishing Noise2Map as 1st among seven models on both tasks does not include error bars, statistical significance testing, or explicit confirmation that the baseline models received equivalent pretraining and data augmentation. Without these, the ranking's robustness cannot be fully assessed, particularly given the claim of superior performance on wildfire-damaged building datasets.

    Authors: We agree that incorporating error bars, statistical significance testing, and explicit details on baseline training protocols would improve the assessment of our results. In the revised manuscript, we will report standard deviations from multiple training runs for the key metrics, include statistical tests (e.g., Wilcoxon signed-rank test) to evaluate the significance of differences in rankings, and provide a detailed table or section clarifying the pretraining and data augmentation strategies used for each of the seven baseline models. This will confirm that comparisons were fair and address concerns regarding the wildfire-damaged building datasets in xView2. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or claims

full rationale

The paper proposes Noise2Map as a repurposing of standard diffusion denoising (with task-specific schedules and timestep conditioning) into an end-to-end discriminative model for semantic segmentation and change detection. It describes a self-supervised pretraining stage followed by supervised fine-tuning on a shared backbone, then reports empirical rankings (average 1st on F1/IoU across SpaceNet7, WHU, xView2) from direct evaluation on public datasets. No equation, prediction, or central claim reduces by construction to a fitted parameter, self-definition, or self-citation chain; the architecture and results remain independent of the inputs they are evaluated against. This is a conventional empirical ML contribution whose validity rests on external benchmarks rather than tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from diffusion models (Gaussian noise, denoising process) and the novel assumption that these can be directly adapted for map prediction in remote sensing without additional sampling steps.

free parameters (1)
  • task-specific noise schedule parameters
    Mentioned as part of the model but details and whether fitted or chosen not specified in abstract.
axioms (1)
  • domain assumption The noise process in diffusion models can be leveraged for discriminative tasks through appropriate conditioning and schedules.
    This is the core motivation stated in the abstract.

pith-pipeline@v0.9.0 · 5590 in / 1424 out tokens · 42629 ms · 2026-05-07T05:44:47.904847+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Cmlformer: Cnn and multi-scale local-context transformer network for remote sensing images semantic segmentation,

    H. Wu, M. Zhang, P. Huang, and W. Tang, “Cmlformer: Cnn and multi-scale local-context transformer network for remote sensing images semantic segmentation,”IEEE JSTARS, 2024

  2. [2]

    Transformers for remote sensing: A systematic review and analysis,

    R. Wanget al., “Transformers for remote sensing: A systematic review and analysis,”Sensors, vol. 24, no. 11, p. 3495, 2024

  3. [3]

    Diffusion models beat gans on image synthesis,

    P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,”NeurIPS, vol. 34, pp. 8780–8794, 2021

  4. [4]

    High- resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022, pp. 10 684–10 695

  5. [5]

    Diffusionsat: A generative foundation model for satellite imagery,

    S. Khannaet al., “Diffusionsat: A generative foundation model for satellite imagery,” inICLR, 2024

  6. [6]

    Crs-diff: Controllable remote sensing image generation with diffusion model,

    D. Tang, X. Cao, X. Hou, Z. Jiang, J. Liu, and D. Meng, “Crs-diff: Controllable remote sensing image generation with diffusion model,” IEEE TGRS, 2024

  7. [7]

    Swimdiff: Scene- wide matching contrastive learning with diffusion constraint for remote sensing image,

    J. Tian, J. Lei, J. Zhang, W. Xie, and Y . Li, “Swimdiff: Scene- wide matching contrastive learning with diffusion constraint for remote sensing image,”IEEE TGRS, 2024

  8. [8]

    Ddpm-cd: Denoising diffusion probabilistic models as feature extractors for remote sensing change detection,

    W. G. C. Bandara, N. G. Nair, and V . Patel, “Ddpm-cd: Denoising diffusion probabilistic models as feature extractors for remote sensing change detection,” inWACV, 2025, pp. 5250–5262

  9. [9]

    Rs-dseg: semantic segmentation of high-resolution remote sensing images based on a diffusion model component with unsupervised pretraining,

    Z. Luoet al., “Rs-dseg: semantic segmentation of high-resolution remote sensing images based on a diffusion model component with unsupervised pretraining,”Scientific Reports, vol. 14, no. 1, p. 18609, 2024

  10. [10]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMICCAI, 2015, pp. 234–241

  11. [11]

    Segnet: A deep con- volutional encoder-decoder architecture for image segmentation,

    V . Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep con- volutional encoder-decoder architecture for image segmentation,”IEEE PAMI, vol. 39, no. 12, pp. 2481–2495, 2017

  12. [12]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

  13. [13]

    Pyramid scene parsing network,

    H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” inCVPR, 2017, pp. 2881–2890

  14. [14]

    Segformer: Simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”NeurIPS, vol. 34, pp. 12 077–12 090, 2021

  15. [15]

    Unified perceptual parsing for scene understanding,

    T. Xiao, Y . Liu, B. Zhou, Y . Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” inECCV, 2018

  16. [16]

    Masked-attention mask transformer for universal image segmentation,

    B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” inCVPR, 2022, pp. 1290–1299

  17. [17]

    A multilevel multimodal fusion transformer for remote sensing semantic segmentation,

    X. Ma, X. Zhang, M.-O. Pun, and M. Liu, “A multilevel multimodal fusion transformer for remote sensing semantic segmentation,”IEEE TGRS, 2024

  18. [18]

    Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,

    O. Manas, A. Lacoste, X. Gir ´o-i Nieto, D. Vazquez, and P. Rodriguez, “Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,” inICCV, 2021, pp. 9414–9423

  19. [19]

    Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation,

    Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu, “Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation,”IEEE GRSM, vol. 11, no. 3, pp. 98–106, 2023

  20. [20]

    Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery,

    Y . Conget al., “Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery,”NeurIPS, vol. 35, pp. 197–211, 2022

  21. [21]

    A review of remote sensing image segmentation by deep learning methods,

    J. Li, Y . Cai, Q. Li, M. Kou, and T. Zhang, “A review of remote sensing image segmentation by deep learning methods,”International Journal of Digital Earth, vol. 17, no. 1, p. 2328827, 2024

  22. [22]

    Deep learning methods for semantic segmentation in remote sensing with small data: A survey,

    A. Yuet al., “Deep learning methods for semantic segmentation in remote sensing with small data: A survey,”Remote sensing, vol. 15, no. 20, p. 4987, 2023

  23. [23]

    Change detection techniques: A review,

    Y . Ban and O. Yousif, “Change detection techniques: A review,”Multi- temporal remote sensing: methods and applications, pp. 19–43, 2016

  24. [24]

    Remote sensing image change detection with transformers,

    H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,”IEEE TGRS, vol. 60, p. 1–14, 2022

  25. [25]

    Change guiding network: Incorporating change prior to guide change detection in remote sensing imagery,

    C. Han, C. Wu, H. Guo, M. Hu, J. Li, and H. Chen, “Change guiding network: Incorporating change prior to guide change detection in remote sensing imagery,”IEEE JSTARS, vol. 16, pp. 8395–8407, 2023

  26. [26]

    A transformer-based siamese network for change detection,

    W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” inIGARSS, 2022, pp. 207–210

  27. [27]

    Changemamba: Re- mote sensing change detection with spatio-temporal state space model,

    H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “Changemamba: Re- mote sensing change detection with spatio-temporal state space model,” IEEE TGRS, 2024

  28. [28]

    Continuous urban change detection from satellite image time series with temporal feature refinement and multi-task integration,

    S. Hafner, H. Fang, H. Azizpour, and Y . Ban, “Continuous urban change detection from satellite image time series with temporal feature refinement and multi-task integration,”IEEE TGRS, 2025

  29. [29]

    Not all diffu- sion model activations have been evaluated as discriminative features,

    B. Meng, Q. Xu, Z. Wang, X. Cao, and Q. Huang, “Not all diffu- sion model activations have been evaluated as discriminative features,” NeurIPS, vol. 37, pp. 55 141–55 177, 2025

  30. [30]

    Diffusion-tta: Test-time adaptation of discriminative models via gen- erative feedback,

    M. Prabhudesai, T.-W. Ke, A. Li, D. Pathak, and K. Fragkiadaki, “Diffusion-tta: Test-time adaptation of discriminative models via gen- erative feedback,”NeurIPS, vol. 36, pp. 17 567–17 583, 2023

  31. [31]

    Text-to-image diffusion models are zero shot classifiers,

    K. Clark and P. Jaini, “Text-to-image diffusion models are zero shot classifiers,”NeurIPS, vol. 36, pp. 58 921–58 937, 2023

  32. [32]

    Unsupervised semantic correspondence using stable diffusion,

    E. Hedlinet al., “Unsupervised semantic correspondence using stable diffusion,”NeurIPS, vol. 36, 2024

  33. [33]

    Diffusion model is secretly a training-free open vocab- ulary semantic segmenter,

    J. Wanget al., “Diffusion model is secretly a training-free open vocab- ulary semantic segmenter,”arXiv preprint arXiv:2309.02773, 2023

  34. [34]

    Diffusiondet: Diffusion model for object detection,

    S. Chen, P. Sun, Y . Song, and P. Luo, “Diffusiondet: Diffusion model for object detection,” inICCV, 2023, pp. 19 830–19 843

  35. [35]

    Enhance image classification via inter-class image mixup with diffusion model,

    Z. Wanget al., “Enhance image classification via inter-class image mixup with diffusion model,” inCVPR, 2024, pp. 17 223–17 233

  36. [36]

    Diffusion models: A comprehensive survey of methods and applications,

    L. Yanget al., “Diffusion models: A comprehensive survey of methods and applications,”ACM Computing Surveys, vol. 56, pp. 1 – 39, 2022

  37. [37]

    Diffusion models meet remote sensing: Principles, methods, and perspectives,

    Y . Liu, J. Yue, S. Xia, P. Ghamisi, W. Xie, and L. Fang, “Diffusion models meet remote sensing: Principles, methods, and perspectives,” IEEE TGRS, vol. 62, pp. 1–22, 2024

  38. [38]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” NeurIPS, vol. 33, pp. 6840–6851, 2020

  39. [39]

    Leveraging diffusion modeling for remote sensing change detection in built-up urban areas,

    R. Wan, J. Zhang, Y . Huang, Y . Li, B. Hu, and B. Wang, “Leveraging diffusion modeling for remote sensing change detection in built-up urban areas,”IEEE Access, vol. 12, pp. 7028–7039, 2024

  40. [40]

    Siamese meets diffusion network: Smdnet for enhanced change detection in high-resolution rs imagery,

    J. Jia, G. Lee, Z. Wang, Z. Lyu, and Y . He, “Siamese meets diffusion network: Smdnet for enhanced change detection in high-resolution rs imagery,”IEEE JSTARS, vol. 17, pp. 8189–8202, 2024

  41. [41]

    Ediffsr: An efficient diffusion probabilistic model for remote sensing image super- resolution,

    Y . Xiao, Q. Yuan, K. Jiang, J. He, X. Jin, and L. Zhang, “Ediffsr: An efficient diffusion probabilistic model for remote sensing image super- resolution,”IEEE TGRS, vol. 62, pp. 1–14, 2023

  42. [42]

    Enhancing remote sensing image super-resolution with efficient hybrid conditional diffusion model,

    L. Hanet al., “Enhancing remote sensing image super-resolution with efficient hybrid conditional diffusion model,”Remote Sensing, vol. 15, no. 13, p. 3452, 2023

  43. [43]

    The multi-temporal urban development spacenet dataset,

    A. Van Etten, D. Hogan, J. M. Manso, J. Shermeyer, N. Weir, and R. Lewis, “The multi-temporal urban development spacenet dataset,” in CVPR, 2021, pp. 6398–6407

  44. [44]

    Semi-supervised urban change detection using multi-modal sentinel-1 sar and sentinel-2 msi data,

    S. Hafner, Y . Ban, and A. Nascetti, “Semi-supervised urban change detection using multi-modal sentinel-1 sar and sentinel-2 msi data,” Remote Sensing, vol. 15, no. 21, p. 5135, 2023

  45. [45]

    xview: Objects in context in overhead imagery,

    D. Lamet al., “xview: Objects in context in overhead imagery,”arXiv preprint arXiv:1802.07856, 2018

  46. [46]

    Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,

    S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE TGRS, vol. 57, no. 1, pp. 574–586, 2018

  47. [47]

    Aid: A benchmark data set for performance evaluation of aerial scene classification,

    G.-S. Xiaet al., “Aid: A benchmark data set for performance evaluation of aerial scene classification,”IEEE TGRS, vol. 55, no. 7, pp. 3965– 3981, 2017

  48. [48]

    Unet++: A nested u-net architecture for medical image segmentation,

    Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” inMICCAI Workshops, 2018, pp. 3–11

  49. [49]

    Vision transformers for dense prediction,

    R. Ranftl, A. Bochkovskiy, and V . Koltun, “Vision transformers for dense prediction,” inICCV, 2021, pp. 12 179–12 188

  50. [50]

    Rs 3 mamba: Visual state space model for remote sensing image semantic segmentation,

    X. Ma, X. Zhang, and M.-O. Pun, “Rs 3 mamba: Visual state space model for remote sensing image semantic segmentation,”IEEE GRSL, vol. 21, pp. 1–5, 2024

  51. [51]

    Fully convolutional siamese networks for change detection,

    R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,” inIEEE ICIP, 2018, pp. 4063–4067

  52. [52]

    Elgc-net: Efficient local–global context aggregation for remote sensing change detection,

    M. Noman, M. Fiaz, H. Cholakkal, S. Khan, and F. S. Khan, “Elgc-net: Efficient local–global context aggregation for remote sensing change detection,”IEEE TGRS, vol. 62, pp. 1–11, 2024

  53. [53]

    Major tom: Expandable datasets for earth observation,

    A. Francis and M. Czerkawski, “Major tom: Expandable datasets for earth observation,” inIGARSS, 2024, pp. 2935–2940

  54. [54]

    Denoising Diffusion Implicit Models

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

  55. [55]

    Elucidating the design space of diffusion-based generative models,

    T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the design space of diffusion-based generative models,”NeurIPS, vol. 35, pp. 26 565–26 577, 2022