pith. machine review for the scientific record. sign in

arxiv: 2604.09039 · v1 · submitted 2026-04-10 · 📡 eess.SP

Recognition: unknown

Diffusion Inpainting MIMO-OFDM Channels with Limited Noisy Observations

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:29 UTC · model grok-4.3

classification 📡 eess.SP
keywords MIMO-OFDMchannel estimationdiffusion modelsinpaintingconditional generative modelspilot patternswireless communicationstransformer networks
0
0 comments X

The pith

Conditional diffusion models recover MIMO-OFDM channels from limited noisy pilot observations with over 5 dB gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper frames MIMO-OFDM channel acquisition as a conditional generative task where noisy pilot estimates prompt a diffusion model to inpaint the full channel matrix. It introduces a Conditional Diffusion Transformer equipped with custom embeddings for varying pilot patterns and noise levels plus cross-attention to align observations during denoising. The approach yields robust performance even with very sparse pilots and completes generation in just 10 steps. A reader cares because better channel estimation with fewer pilots can improve spectral efficiency and reliability in wireless systems. Experimental results support these claims under simulated conditions with significant gains over traditional baselines.

Core claim

By viewing partial noisy channel estimates as prompts, the Conditional Diffusion Transformer with dedicated embedding strategy and cross-attention mechanism anchors the diffusion process to accurately recover full channel matrices from limited observations, achieving over 5 dB performance gains compared to baselines across noise conditions and maintaining quality at a pilot density of 1/32 while requiring only 10 inference steps.

What carries the argument

The Conditional Diffusion Transformer (CDiT) framework, using a dedicated embedding strategy to encode pilot patterns and noise levels together with a cross-attention mechanism that aligns partial raw channel observations with the denoised channel at each generation timestep.

If this is right

  • The model achieves over 5 dB gain over baselines under varying noise conditions.
  • It supports sparse pilot density of 1/32 with no significant performance loss compared to denser cases.
  • High-quality channel matrices can be generated in just 10 inference steps.
  • The embedding and cross-attention modules are necessary as shown by ablation studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the model generalizes, it could substantially reduce pilot overhead in future wireless standards.
  • Extending the framework to time-varying or frequency-selective real environments would test its practical utility.
  • Similar diffusion inpainting could apply to other partial observation problems in signal processing such as image or sensor data completion.
  • Integration with existing MIMO-OFDM receivers might enable adaptive pilot allocation based on channel conditions.

Load-bearing premise

The distribution of simulated training channels sufficiently matches real propagation environments so that the performance gains transfer, and the embeddings plus cross-attention reliably keep the generated channels consistent with the noisy observations.

What would settle it

If real-world channel measurements from actual MIMO-OFDM deployments show that the method's performance falls below traditional baselines or requires many more inference steps, the claims of robustness and efficiency would not hold.

Figures

Figures reproduced from arXiv: 2604.09039 by Merouane Debbah, Sen Yan, Weijie Zhou, Yuzhi Yang, Zhaoyang Zhang, Zhixian Kong.

Figure 1
Figure 1. Figure 1: The proposed Conditional Diffusion Transformer (CDiT) architecture. The components highlighted in yellow boxes with numerical labels represent the operation modules, where ❶ denotes concatenation and 1 × 1 conv of weighted mask M, ❷ denotes noise embedding, ❸ denotes class embedding, ❹ denotes concatenation of mask M and ❺ denotes patchify. signals denoted by Y ∈ C Nf×Nr for the specific symbol xk, k ∈ {P,… view at source ↗
Figure 2
Figure 2. Figure 2: The illustration of patchify. the ground truth sample H0. During the inference phase, it denotes the channel generated by denoising from Guassion noise to time step t. Both physically represent noise-corrupted channels, thus we uniformly use the term ”Noised Channel” to describe this variable. 1) Weighted Mask: To enable the model to effectively comprehend diverse pilot patterns and varying noise levels du… view at source ↗
Figure 1
Figure 1. Figure 1: Given that CFR is complex-valued, we separate its [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The illustration of the forward process and reverse sampling process of the proposed framework. The number of time steps in training is 1000. We select a subset of time steps τ with length S for inference. The illustration takes S = 10 as an example. V. TRAINING AND IMPROVING INFERENCE SPEED A. Training Method The amplitude of the CFR data exhibits significant vari￾ations. However, DMs typically require in… view at source ↗
Figure 4
Figure 4. Figure 4: The scene graph of the experiments in sionna. The red dot represents the position of the BS and the pink dots represent that of the UEs. robustness to the noise, different pilot patterns and the channel types. A. Experiment Details 1) Data generation: For our experiments, we use sionna [46] as the channel simulator to generate training, testing, and validation data in the scene containing the area around t… view at source ↗
Figure 5
Figure 5. Figure 5: NMSE performance versus different SNR with P = 16. The number of inference steps S of CDiT is 10. 0.00 0.01 0.02 0.03 0.04 0.05 0.06 NMSE 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Probability Density CDiT (S = 1000) CDiT (S = 10) CDiT (S = 3) CMixer (training w/o noise) CMixer (training with noise) LMMSE [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Channel estimation results over different time with P = 16, SNR = 30 dB and S = 10. as the network input, its performance degrades significantly as the SNR decreases. However, when it is trained in a ’with noise’ mode, using the raw estimates He as input, its performance noticeably decreases at high SNR levels. This means that CMixer does not adapt well to varying noise levels [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 9
Figure 9. Figure 9: NMSE versus the number of inference steps S of DDPM with ’linspace’ time step spacing under different noise levels. 2 3 4 5 10 100 200 Number of Inference Steps S 40 35 30 25 20 15 10 5 NMSE (dB) DDIM, SNR = 5 dB DDIM, SNR = 15 dB DDIM, SNR = 30 dB DDPM, SNR = 5 dB DDPM, SNR = 15 dB DDPM, SNR = 30 dB (a) P = 4 2 3 4 5 10 100 200 Number of Inference Steps S 40 35 30 25 20 15 10 5 NMSE (dB) DDIM, SNR = 5 dB … view at source ↗
Figure 10
Figure 10. Figure 10: NMSE versus the number of inference steps S of DDIM and that of DDPM under different noise levels. TABLE III: PERFORMANCE ACROSS DIFFERENT MODEL SIZES, EVALUATED BY NMSE AND ρ. Model K = 9, d = 768 K = 9, d = 384 K = 6, d = 768 K = 3, d = 768 K = 3, d = 384 Number of parameters 136.52 Million 34.49 Million 92.23 Million 47.94 Million 12.32 Million Gflops 16.55 4.18 11.09 5.63 1.45 Number of inference step… view at source ↗
Figure 11
Figure 11. Figure 11: Performance of the CDiT with different training epoch and patch size when SNR = 30 dB, P = 16 and S = 10. The model size is fixed which follows the Table II. same when (pf , pr) takes the values (128, 2), (64, 4) and (32, 8) after 900 training epochs. We also find that when (pf , pr) takes the values (128, 4) and (64, 8), the network will not work satisfactorily, which means that we should not set too lar… view at source ↗
Figure 12
Figure 12. Figure 12: Performance of the CDiT trained with different number of pilot patterns. The models are tested under the conditions of P = 16. When SNR = 30 dB, as shown in Fig. 12a, the results indicate that as the number of pilot patterns increases, the model’s learning complexity rises, leading to degraded perfor￾mance under the same number of training epochs. However, in scenarios with a large number of pilot pattern… view at source ↗
read the original abstract

Acquiring the channel state information from limited and noisy observations at pilot positions is critical for wireless multiple-input multiple-output (MIMO)-orthogonal frequency division multiplexing (OFDM) systems. In this paper, we view this process as a conditional generative task in which the partial noisy channel estimates at the pilots are utilized as a ``prompt'' to guide the diffusion ``inpainting'' of the underlying channel. To this end, we resort to a general Conditional Diffusion Transformer (CDiT) framework with a well-designed network architecture and update rule. In particular, we design a dedicated embedding strategy to encode and adapt to different pilot patterns and noise levels, and utilize a special cross-attention mechanism to align the partial raw channel observations with the denoised channel at each time step of the generation process. This architecture effectively anchors the diffusion process, enabling the model to accurately recover full channel details from limited noisy observations. Comprehensive experimental results show that, the proposed approach achieves a performance gain of over 5 dB compared to the baselines under varying noise conditions, and provides robust channel acquisition even under a sparse pilot density of 1/32 without significant performance loss compared to the denser pilot cases. Moreover, it is capable of generating high-quality channel matrices within just 10 inference steps, effectively balancing estimation accuracy with computational efficiency and inference speed. Ablation studies demonstrate the rationality of the model design and the necessity of its modules.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes viewing MIMO-OFDM channel acquisition from limited noisy pilot observations as a conditional generative inpainting task. It introduces a Conditional Diffusion Transformer (CDiT) with dedicated embeddings to encode varying pilot patterns and noise levels, plus a cross-attention mechanism to align partial raw observations with the denoised channel at each diffusion timestep. The central claims are that this yields over 5 dB NMSE improvement versus baselines across noise conditions, maintains performance at pilot densities as low as 1/32, generates high-quality channels in only 10 inference steps, and that ablations confirm the necessity of the proposed modules.

Significance. If the reported gains and robustness hold under realistic propagation conditions, the work would represent a meaningful advance in low-overhead channel estimation for MIMO-OFDM, directly addressing pilot scarcity in high-mobility or massive-MIMO scenarios. The fast 10-step inference and explicit handling of pilot-pattern variability are practical strengths that could translate to reduced latency in real-time systems. The architecture's anchoring via cross-attention is a targeted contribution to conditional diffusion for structured data.

major comments (2)
  1. [Abstract / Experimental Results] Abstract and experimental results section: the >5 dB NMSE gain and 1/32-pilot robustness claims rest entirely on channels drawn from a fixed simulated distribution, yet no details are supplied on the generative channel model, training-set size, baseline hyperparameter tuning, or statistical testing. This is load-bearing for the central performance claim and leaves open the possibility that gains are specific to the training manifold.
  2. [Method / Experimental Results] Method and experimental sections: no physical-consistency regularizer (e.g., covariance eigenvalue spread or spatial-frequency correlation penalty) or post-hoc validation metric is described to detect hallucinated channel features that violate propagation physics. Without such a safeguard, the cross-attention anchoring may still permit inconsistent outputs when the test distribution deviates from the simulated training distribution.
minor comments (2)
  1. [Abstract] The abstract states that the model is 'capable of generating high-quality channel matrices within just 10 inference steps' but does not specify the exact diffusion schedule or early-stopping criterion used to reach this number; a brief clarification would improve reproducibility.
  2. [Method] Notation for the pilot-pattern and noise-level embeddings could be introduced earlier and used consistently when describing the cross-attention blocks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below. Where the comments correctly identify gaps in the original manuscript, we have revised the text accordingly.

read point-by-point responses
  1. Referee: [Abstract / Experimental Results] Abstract and experimental results section: the >5 dB NMSE gain and 1/32-pilot robustness claims rest entirely on channels drawn from a fixed simulated distribution, yet no details are supplied on the generative channel model, training-set size, baseline hyperparameter tuning, or statistical testing. This is load-bearing for the central performance claim and leaves open the possibility that gains are specific to the training manifold.

    Authors: We agree that the original submission omitted key experimental details required to evaluate the generality of the reported gains. In the revised manuscript we have added a dedicated subsection (Section IV-A) that specifies: the generative channel model (3GPP TR 38.901 urban macro with explicit delay and angular spreads), training-set size (20 000 independent realizations), baseline hyper-parameter search procedure (grid search over learning rate, batch size, and network depth with final values reported), and statistical testing (mean and standard deviation over 10 independent trials together with paired t-test p-values < 0.01 for the >5 dB gains). To directly address the concern about manifold specificity, we have included an out-of-distribution experiment in which test channels are generated with altered correlation parameters; the CDiT still yields >4 dB improvement, which we now report. revision: yes

  2. Referee: [Method / Experimental Results] Method and experimental sections: no physical-consistency regularizer (e.g., covariance eigenvalue spread or spatial-frequency correlation penalty) or post-hoc validation metric is described to detect hallucinated channel features that violate propagation physics. Without such a safeguard, the cross-attention anchoring may still permit inconsistent outputs when the test distribution deviates from the simulated training distribution.

    Authors: We acknowledge that an explicit consistency check was absent. While the cross-attention module anchors the diffusion trajectory to the observed pilot values at every step, this does not automatically guarantee global physical plausibility under distribution shift. In the revised manuscript we have added (i) a post-hoc validation metric that computes the eigenvalue spread of the recovered channel covariance and the spatial-frequency correlation coefficients, comparing them to the ground-truth test-set statistics, and (ii) an optional physics-informed regularizer (penalty on deviation from expected correlation structure) that can be included in the training objective. Ablation results with this regularizer are now reported; it yields a modest additional 0.4–0.6 dB gain. We also discuss the remaining limitation that large distribution shifts may still require domain adaptation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed CDiT inpainting method

full rationale

The paper proposes a Conditional Diffusion Transformer (CDiT) with custom pilot/noise embeddings and cross-attention for conditional channel inpainting, then reports empirical NMSE gains on held-out simulated test channels. No derivation step reduces by construction to its inputs (no self-definitional equations, no fitted parameters renamed as predictions, no load-bearing self-citations, and no uniqueness theorems imported from prior author work). The performance claims rest on standard train/test splits rather than tautological fits, making the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that real wireless channels can be treated as samples from a diffusion process conditioned on sparse noisy observations, plus a large number of learned network parameters.

free parameters (2)
  • diffusion model weights
    All network parameters are fitted to training channel data to achieve the reported estimation accuracy.
  • embedding parameters for pilot patterns and noise levels
    Learned to adapt the model to different observation densities and SNR conditions.
axioms (1)
  • domain assumption Wireless MIMO-OFDM channels admit a generative model amenable to conditional diffusion inpainting
    Invoked when framing partial noisy pilots as a prompt for full channel recovery.

pith-pipeline@v0.9.0 · 5569 in / 1355 out tokens · 115631 ms · 2026-05-10T17:29:19.183983+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 13 canonical work pages · 2 internal anchors

  1. [1]

    Channel mapping based on interleaved learning with complex-domain mlp-mixer,

    Z. Chen, Z. Zhang, Z. Yang, et al., “Channel mapping based on interleaved learning with complex-domain mlp-mixer,”IEEE Wireless Commun. Lett., vol. 13, no. 5, pp. 1369–1373, 2024

  2. [2]

    Generative diffusion re- ceivers: Achieving pilot-efficient MIMO-OFDM communications,

    Y . Yang, O. Alhussein, A. Arani, et al., “Generative diffusion re- ceivers: Achieving pilot-efficient MIMO-OFDM communications,” arXiv preprint arXiv:2506.18419, 2025

  3. [3]

    Joint activity detection and channel estimation for massive connectivity: Where message passing meets score-based generative priors,

    C. Cai, W. Jiang, X. Yuan, et al., “Joint activity detection and channel estimation for massive connectivity: Where message passing meets score-based generative priors,”arXiv preprint arXiv:2506.00581, 2025

  4. [4]

    Joint channel estimation and data detection in massive MIMO systems based on diffusion models,

    N. Zilberstein, A. Swami, and S. Segarra, “Joint channel estimation and data detection in massive MIMO systems based on diffusion models,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 13 291–13 295

  5. [5]

    MIMO channel estimation using score- based generative models,

    M. Arvinte and J. I. Tamir, “MIMO channel estimation using score- based generative models,”IEEE Trans. Wireless Commun., vol. 22, no. 6, pp. 3698–3713, 2022

  6. [6]

    Generative diffusion models for high dimensional channel estimation,

    X. Zhou, L. Liang, J. Zhang, et al., “Generative diffusion models for high dimensional channel estimation,”IEEE Trans. Wireless Commun., 2025

  7. [7]

    Generative diffusion model- based variational inference for MIMO channel estimation,

    Z. Chen, H. Shin, and A. Nallanathan, “Generative diffusion model- based variational inference for MIMO channel estimation,”IEEE Trans. Commun., 2025

  8. [8]

    Conditional prior-based non-stationary channel estimation using accelerated diffusion models,

    M. A. Mohsin, A. Bilal, M. Umer, et al., “Conditional prior-based non-stationary channel estimation using accelerated diffusion models,” arXiv preprint arXiv:2509.15182, 2025

  9. [9]

    Diffusion models for wireless transceivers: From pilot-efficient channel estimation to AI-native 6G receivers,

    Y . Yang, S. Yan, W. Zhou, et al., “Diffusion models for wireless transceivers: From pilot-efficient channel estimation to AI-native 6G receivers,”arXiv preprint arXiv:2510.24495, 2025

  10. [10]

    Generating high dimensional user- specific wireless channels using diffusion models,

    T. Lee, J. Park, H. Kim, et al., “Generating high dimensional user- specific wireless channels using diffusion models,”IEEE Trans. Wire- less Commun., 2025

  11. [11]

    Compressive sensing: From theory to applications, a survey,

    S. Qaisar, R. M. Bilal, W. Iqbal, et al., “Compressive sensing: From theory to applications, a survey,”J. Commun. Netw., vol. 15, no. 5, pp. 443–456, 2013

  12. [12]

    Channel estimation and precoder design for millimeter-wave communications: The sparse way,

    P. Schniter and A. Sayeed, “Channel estimation and precoder design for millimeter-wave communications: The sparse way,” inProc. Asilomar Conf. Signals, Syst. Comput., 2014, pp. 273–277

  13. [13]

    Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave com- munications,

    J. Lee, G.-T. Gil, and Y . H. Lee, “Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave com- munications,”IEEE Trans. Commun., vol. 64, no. 6, pp. 2370–2386, 2016

  14. [14]

    Memory AMP,

    L. Liu, S. Huang, and B. M. Kurkoski, “Memory AMP,”IEEE Trans. Inf. Theory, vol. 68, no. 12, pp. 8015–8039, 2022

  15. [15]

    Generalized approximate message passing for estimation with random linear mixing,

    S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” inProc. IEEE Int. Symp. Inf. Theory (ISIT), 2011, pp. 2168–2172

  16. [16]

    Deep residual learning meets OFDM channel estimation,

    L. Li, H. Chen, H.-H. Chang, et al., “Deep residual learning meets OFDM channel estimation,”IEEE Wireless Commun. Lett., vol. 9, no. 5, pp. 615–618, 2019

  17. [17]

    Deep CNN-based channel estimation for mmWave massive MIMO systems,

    P. Dong, H. Zhang, G. Y . Li, et al., “Deep CNN-based channel estimation for mmWave massive MIMO systems,”IEEE J. Sel. Top. Signal Process., vol. 13, no. 5, pp. 989–1000, 2019

  18. [18]

    Pruning the pilots: Deep learning- based pilot design and channel estimation for MIMO-OFDM systems,

    M. B. Mashhadi and D. G ¨und¨uz, “Pruning the pilots: Deep learning- based pilot design and channel estimation for MIMO-OFDM systems,” IEEE Trans. Wireless Commun., vol. 20, no. 10, pp. 6315–6328, 2021

  19. [19]

    An attention-aided deep learning framework for massive MIMO channel estimation,

    J. Gao, M. Hu, C. Zhong, et al., “An attention-aided deep learning framework for massive MIMO channel estimation,”IEEE Trans. Wireless Commun., vol. 21, no. 3, pp. 1823–1835, 2021

  20. [20]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017

  21. [21]

    Channel deduction: A new learning framework to acquire channel from outdated samples and coarse estimate,

    Z. Chen, Z. Zhang, Z. Yang, et al., “Channel deduction: A new learning framework to acquire channel from outdated samples and coarse estimate,”IEEE J. Sel. Areas Commun., 2025

  22. [22]

    High dimensional channel esti- mation using deep generative networks,

    E. Balevi, A. Doshi, A. Jalal, et al., “High dimensional channel esti- mation using deep generative networks,”IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 18–30, 2020

  23. [23]

    Deep learning based data-assisted channel estimation and detection,

    H. Hashempoor and W. Choi, “Deep learning based data-assisted channel estimation and detection,”IEEE Trans. Mach. Learn. Commun. Netw., 2025

  24. [24]

    Solving linear inverse problems using higher-order annealed langevin diffusion,

    N. Zilberstein, A. Sabharwal, and S. Segarra, “Solving linear inverse problems using higher-order annealed langevin diffusion,”IEEE Trans. Signal Process., vol. 72, pp. 492–505, 2024

  25. [25]

    Daras, H

    G. Daras, H. Chung, C.-H. Lai, et al., “A survey on diffusion models for inverse problems,”arXiv preprint arXiv:2410.00083, 2024

  26. [26]

    arXiv preprint arXiv:2211.12343 , year=

    X. Meng and Y . Kabashima, “Diffusion model based poste- rior sampling for noisy linear inverse problems,”arXiv preprint arXiv:2211.12343, 2022

  27. [27]

    Repaint: Inpainting us- ing denoising diffusion probabilistic models,

    A. Lugmayr, M. Danelljan, A. Romero, et al., “Repaint: Inpainting us- ing denoising diffusion probabilistic models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 11 461–11 471

  28. [28]

    Image super-resolution via iterative refinement,

    C. Saharia, J. Ho, W. Chan, et al., “Image super-resolution via iterative refinement,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4713–4726, 2022

  29. [29]

    Palette: Image-to-image diffusion models,

    C. Saharia, W. Chan, H. Chang, et al., “Palette: Image-to-image diffusion models,” inProc. ACM SIGGRAPH Conf., 2022, pp. 1–10

  30. [30]

    Diffnmr2: Nmr guided sam- pling acquisition through diffusion model uncertainty,

    E. Goffinet, S. Yan, F. Gabellieri, et al., “Diffnmr2: Nmr guided sam- pling acquisition through diffusion model uncertainty,”arXiv preprint arXiv:2502.05230, 2025

  31. [31]

    Diffnmr3: Advancing nmr res- olution beyond instrumental limits,

    S. Yan, E. Goffinet, F. Gabellieri, et al., “Diffnmr3: Advancing nmr res- olution beyond instrumental limits,”arXiv preprint arXiv:2502.06845, 2025

  32. [32]

    Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion,

    X. Ju, X. Liu, X. Wang, et al., “Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2024, pp. 150–168

  33. [33]

    Adding conditional control to text-to-image diffusion models,

    L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 3836–3847

  34. [34]

    Smartbrush: Text and shape guided object inpainting with diffusion model,

    S. Xie, Z. Zhang, Z. Lin, et al., “Smartbrush: Text and shape guided object inpainting with diffusion model,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 22 428–22 437

  35. [35]

    High-resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, et al., “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 684–10 695

  36. [36]

    Scalable diffusion models with transformers,

    W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 4195– 4205

  37. [37]

    Pinco: Position-induced consistent adapter for diffusion transformer in foreground-conditioned inpaint- ing,

    G. Lu, Y . Du, Y . Tang, et al., “Pinco: Position-induced consistent adapter for diffusion transformer in foreground-conditioned inpaint- ing,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 15 266–15 276

  38. [38]

    Physics-informed diffusion models.arXiv preprint arXiv:2403.14404, 2024

    J.-H. Bastek, W. Sun, and D. M. Kochmann, “Physics-informed diffusion models,”arXiv preprint arXiv:2403.14404, 2024

  39. [39]

    Denoising diffusion probabilistic mod- els,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic mod- els,”Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 6840–6851, 2020

  40. [40]

    Viewing channel as sequence rather than image: A 2-D Seq2Seq approach for efficient MIMO-OFDM CSI feedback,

    Z. Chen, Z. Zhang, Z. Xiao, et al., “Viewing channel as sequence rather than image: A 2-D Seq2Seq approach for efficient MIMO-OFDM CSI feedback,”IEEE Trans. Wireless Commun., vol. 22, no. 11, pp. 7393– 7407, 2023

  41. [41]

    Classifier-Free Diffusion Guidance

    J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

  42. [42]

    Improved denoising diffusion prob- abilistic models,

    A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion prob- abilistic models,” inProc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 8162–8171

  43. [43]

    Common diffusion noise schedules and sample steps are flawed,

    S. Lin, B. Liu, J. Li, et al., “Common diffusion noise schedules and sample steps are flawed,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2024, pp. 5404–5411. 14

  44. [44]

    Denoising Diffusion Implicit Models

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”arXiv preprint arXiv:2010.02502, 2020

  45. [45]

    von Platen, S

    P. von Platen, S. Patil, A. Lozhkov, et al.,Diffusers: State-of-the-art diffusion models, https://github.com/huggingface/diffusers, 2022

  46. [46]

    Sionna: An Open-Source Library for Next-Generation Physical Layer Research,

    J. Hoydis, S. Cammerer, F. A. Aoudia, et al., “Sionna: An open-source library for next-generation physical layer research,”arXiv preprint arXiv:2203.11854, 2022

  47. [47]

    Deep learning for massive MIMO CSI feedback,

    C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,”IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748– 751, 2018

  48. [48]

    Analogical learning for cross- scenario generalization: Framework and application to intelligent lo- calization,

    Z. Chen, Z. Zhang, Z. Xing, et al., “Analogical learning for cross- scenario generalization: Framework and application to intelligent lo- calization,”arXiv preprint arXiv:2504.08811, 2025