pith. sign in

arxiv: 2306.04321 · v2 · pith:NUPZGYGInew · submitted 2023-06-07 · 💻 cs.AI · cs.MM

Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

Pith reviewed 2026-05-24 08:41 UTC · model grok-4.3

classification 💻 cs.AI cs.MM
keywords semantic communicationdiffusion modelsimage generationsemantic preservationbandwidth reductionnoisy channelsgenerative modelsAI communications
0
0 comments X

The pith

A diffusion model can synthesize high-quality images that preserve semantic meaning from only highly compressed and noisy semantic data sent over a channel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes sending only denoised semantic information instead of full bit sequences, then using a diffusion model to regenerate the original scene at the receiver. The model is conditioned on this partial data through spatially-adaptive normalizations so that the output remains consistent with the transmitted semantics. This setup is tested across multiple noisy-channel scenarios and shown to keep objects, locations, and depths recognizable even when conventional recovery would fail. A reader would care because the approach directly addresses bandwidth limits in visual communication while still delivering usable meaning rather than exact pixels.

Core claim

By transmitting highly compressed semantic information only and guiding a diffusion model with spatially-adaptive normalizations from the denoised received signal, complex scenes can be synthesized that preserve key semantic features without recovering the original bits or requiring extra data or post-processing; the method outperforms prior solutions by maintaining recognizable objects, locations, and depths under extreme channel noise.

What carries the argument

Diffusion-guided synthesis conditioned on denoised semantic information via spatially-adaptive normalizations, which steers generation to maintain semantic consistency while allowing bandwidth reduction.

If this is right

  • Bandwidth is reduced because only compressed semantic descriptors need to be sent rather than full bit streams.
  • Image quality and semantic fidelity remain high even when the received signal is severely degraded by noise.
  • Objects, spatial layout, and depth remain recognizable without any bit-level recovery or additional side information.
  • The same framework applies to multiple communication scenarios without task-specific retraining beyond the diffusion conditioning step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to video by conditioning successive frames on a shared semantic stream to enforce temporal consistency.
  • Goal-oriented communication becomes feasible if the diffusion guidance is further shaped by a downstream task loss instead of pure reconstruction.
  • Hybrid systems might combine this generative recovery with occasional high-fidelity patches when semantic uncertainty exceeds a threshold.

Load-bearing premise

The diffusion model can reliably produce semantically faithful complex scenes from nothing more than the received denoised semantic information and the normalizations derived from it.

What would settle it

Quantitative evaluation on a held-out test set in which channel noise is increased until the generated images lose measurable semantic fidelity, for example by dropping object recognition accuracy or depth estimation error below a chosen threshold compared with the source.

Figures

Figures reproduced from arXiv: 2306.04321 by Danilo Comminiello, Eleonora Grassucci, Sergio Barbarossa.

Figure 1
Figure 1. Figure 1: Synthesized images from the transmitted semantics with [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proposed generative semantic communication framework. The sender transmits one-hot, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Our method results for different PSNR values of the communication channel. The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparisons among most performing models (CC-FPSE [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Generated samples from ablation studies with [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Encoder and decoder blocks of our U-Net-based semantic diffusion model. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of how the JPEG compression affects the original image and a sample of the [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Other comparisons for transmitted semantics and a fixed PSNR value of [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Generatd samples of our method from the transmitted semantics under different PSNR [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

Semantic communication is expected to be one of the cores of next-generation AI-based communications. One of the possibilities offered by semantic communication is the capability to regenerate, at the destination side, images or videos semantically equivalent to the transmitted ones, without necessarily recovering the transmitted sequence of bits. The current solutions still lack the ability to build complex scenes from the received partial information. Clearly, there is an unmet need to balance the effectiveness of generation methods and the complexity of the transmitted information, possibly taking into account the goal of communication. In this paper, we aim to bridge this gap by proposing a novel generative diffusion-guided framework for semantic communication that leverages the strong abilities of diffusion models in synthesizing multimedia content while preserving semantic features. We reduce bandwidth usage by sending highly-compressed semantic information only. Then, the diffusion model learns to synthesize semantic-consistent scenes through spatially-adaptive normalizations from such denoised semantic information. We prove, through an in-depth assessment of multiple scenarios, that our method outperforms existing solutions in generating high-quality images with preserved semantic information even in cases where the received content is significantly degraded. More specifically, our results show that objects, locations, and depths are still recognizable even in the presence of extremely noisy conditions of the communication channel. The code is available at https://github.com/ispamm/GESCO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a generative semantic communication system that transmits only highly compressed semantic features, denoises them at the receiver, and conditions a diffusion model via spatially-adaptive normalizations to synthesize images that preserve semantic content (objects, locations, depths) even under severe channel noise. It claims to outperform prior semantic-communication baselines across multiple scenarios while reducing bandwidth, with publicly released code.

Significance. If the central empirical claims hold under rigorous semantic-fidelity testing, the framework would demonstrate a practical route to goal-oriented image regeneration from minimal transmitted data, directly addressing bandwidth constraints in next-generation semantic communications. The open-source implementation is a clear strength that supports reproducibility and extension.

major comments (2)
  1. [Abstract and Evaluation section] Abstract and Evaluation section: the claim that 'objects, locations, and depths are still recognizable' under extreme noise is load-bearing for the paper's contribution, yet the reported assessment relies on standard generative metrics or qualitative figures without downstream semantic-task metrics (e.g., object-detection mAP or segmentation IoU) computed on the synthesized outputs across SNR regimes. This leaves the semantic-preservation assertion unverified.
  2. [Method section] Method section (diffusion conditioning): the description of how spatially-adaptive normalizations applied to the denoised semantic features alone enable reliable scene synthesis does not include an ablation or quantitative test isolating the contribution of the conditioning mechanism versus any implicit priors, which is required to substantiate the 'no additional transmitted data' bandwidth claim.
minor comments (1)
  1. [Abstract] The abstract states 'in-depth assessment across scenarios' but omits explicit listing of baselines, datasets, and exact quantitative metrics; adding a concise table or paragraph with these details would improve clarity without altering the technical contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below, indicating the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract and Evaluation section] Abstract and Evaluation section: the claim that 'objects, locations, and depths are still recognizable' under extreme noise is load-bearing for the paper's contribution, yet the reported assessment relies on standard generative metrics or qualitative figures without downstream semantic-task metrics (e.g., object-detection mAP or segmentation IoU) computed on the synthesized outputs across SNR regimes. This leaves the semantic-preservation assertion unverified.

    Authors: We agree that downstream task-specific metrics would provide stronger, more direct evidence for semantic preservation under noise. In the revised manuscript we will add object-detection mAP and segmentation IoU results computed on the synthesized images across the reported SNR regimes, using standard pre-trained models. These metrics will be presented alongside the existing generative metrics to verify the claim. revision: yes

  2. Referee: [Method section] Method section (diffusion conditioning): the description of how spatially-adaptive normalizations applied to the denoised semantic features alone enable reliable scene synthesis does not include an ablation or quantitative test isolating the contribution of the conditioning mechanism versus any implicit priors, which is required to substantiate the 'no additional transmitted data' bandwidth claim.

    Authors: We acknowledge the value of an explicit ablation to isolate the contribution of the spatially-adaptive normalization conditioning. The revised manuscript will include a quantitative ablation study comparing the full model against a variant without the conditioning (or with a simpler conditioning scheme), reporting the same generative metrics while keeping the transmitted semantic features identical. This will directly support the bandwidth claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical framework for generative semantic communication using diffusion models conditioned on compressed, denoised semantic features via spatially-adaptive normalizations. Performance claims rest on experimental evaluation across noisy channel scenarios rather than any mathematical derivation chain. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text that would reduce the central results to inputs by construction. The assessment is described as in-depth but external to any internal definitional loop, making the work self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; no explicit free parameters, axioms, or invented entities are identifiable. The approach implicitly relies on standard assumptions of diffusion model training and semantic feature extraction but provides no details.

pith-pipeline@v0.9.0 · 5763 in / 1168 out tokens · 21990 ms · 2026-05-24T08:41:49.964486+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations

    cs.CV 2026-02 unverdicted novelty 7.0

    A causal diffusion model reconstructs videos from ultra-low-bitrate semantics and compressed frames using temporal distillation from a bidirectional teacher, outperforming prior baselines.

  2. Intention-Aware Semantic Agent Communications for AI Glasses

    eess.SP 2026-04 unverdicted novelty 5.0

    An intention-aware semantic agent system for AI glasses reduces bandwidth by over 50% in simulations while preserving task performance through adaptive preprocessing guided by inferred user intentions.

  3. Anchor-Aided Multi-User Semantic Communication with Adaptive Decoders

    cs.IT 2026-04 unverdicted novelty 5.0

    A multi-user semantic communication framework uses an anchor decoder symmetric to the encoder to overcome catastrophic forgetting, enabling one frozen encoder to support adaptive decoders for users with varying comput...

  4. Anchor-Aided Multi-User Semantic Communication with Adaptive Decoders

    cs.IT 2026-04 unverdicted novelty 5.0

    A multi-user semantic communication framework employs an anchor decoder symmetric to the encoder to mitigate catastrophic forgetting, enabling sequential training and frozen-encoder adaptation for users with distinct ...

  5. Lightweight Diffusion Models for Resource-Constrained Semantic Communication

    eess.SP 2024-10 unverdicted novelty 5.0

    Q-GESCO uses quantized diffusion models to regenerate images from semantic maps in noisy channels, matching full-precision performance with up to 75% memory and 79% FLOP reductions.

  6. Training-Free Multi-User Generative Semantic Communications via Null-Space Diffusion Sampling

    eess.SP 2024-05 unverdicted novelty 5.0

    Introduces a null-space diffusion sampling method for training-free multi-user generative semantic communications in OFDMA systems.

  7. Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

    eess.SP 2025-11 unverdicted novelty 3.0

    The tutorial synthesizes diffusion model techniques for generative semantic communications to achieve high compression while preserving meaning in wireless transmission.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 6 Pith papers · 3 internal anchors

  1. [1]

    Semantic communications: Overview, open issues, and future research directions,

    X. Luo, H.-H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,” IEEE Wireless Comm., vol. 29, no. 1, pp. 210–219, 2022

  2. [2]

    Communication beyond transmitting bits: Semantics-guided source and channel coding,

    J. Dai, P. Zhang, K. Niu, S. Wang, Z. Si, and X. Qin, “Communication beyond transmitting bits: Semantics-guided source and channel coding,” ArXiv preprint: ArXiv:2208.02481, 2021

  3. [3]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 6840–6851, 2020

  4. [4]

    Photorealistic text-to- image diffusion models with deep language understanding,

    C. Saharia, C. W., S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, R. Gontijo- Lopes, B. K. Ayan, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi, “Photorealistic text-to- image diffusion models with deep language understanding,” in Advances in Neural Information Processing Systems (NeurIPS), 2022

  5. [5]

    High-resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 10674–10685, 2021

  6. [6]

    Text-to-audio generation using instruction- tuned LLM and latent diffusion model,

    D. Ghosal, N. Majumder, A. Mehrish, and S. Poria, “Text-to-audio generation using instruction- tuned LLM and latent diffusion model,”ArXiv preprint: ArXiv:2304.13731, 2023

  7. [7]

    CogVideo: Large-scale pretraining for text- to-video generation via transformers,

    W. Hong, M. Ding, W. Zheng, X. Liu, and J. Tang, “CogVideo: Large-scale pretraining for text- to-video generation via transformers,” inInternational Conference on Learning Representations (ICLR), 2023

  8. [8]

    Semantic image synthesis via diffusion models,

    W. Wang, J. Bao, W.-G. Zhou, D. Chen, D. Chen, L. Yuan, and H. Li, “Semantic image synthesis via diffusion models,”ArXiv preprint: ArXiv:2207.00050, 2022

  9. [9]

    Freestyle layout-to-image synthesis,

    H. Xue, Z. H. Feng, Q. Sun, L. Song, and W. Zhang, “Freestyle layout-to-image synthesis,” ArXiv preprint ArXiv:2303.14412, 2023

  10. [10]

    6G networks: Beyond Shannon towards semantic and goal-oriented communications,

    E. Calvanese Strinati and S. Barbarossa, “6G networks: Beyond Shannon towards semantic and goal-oriented communications,” Computer Networks, vol. 190, p. 107930, 2020

  11. [11]

    Joint task and data oriented semantic commu- nications: A deep separate source-channel coding scheme,

    J. Huang, D. Li, C. H. Xiu, X. Qin, and W. Zhang, “Joint task and data oriented semantic commu- nications: A deep separate source-channel coding scheme,” ArXiv preprint: ArXiv:2302.13580 , 2023

  12. [12]

    Semantic communications: Principles and challenges,

    Z. Qin, X. Tao, J. Lu, and G. Y . Li, “Semantic communications: Principles and challenges,” ArXiv preprint: ArXiv:2201.01389, 2021

  13. [13]

    Semantic-preserving image compression,

    N. Patwa, N. A. Ahuja, S. Somayazulu, O. Tickoo, S. Varadarajan, and S. G. Koolagudi, “Semantic-preserving image compression,”IEEE International Conference on Image Processing (ICIP), pp. 1281–1285, 2020

  14. [14]

    An end-to-end deep learning image compression framework based on semantic analysis,

    C. Wang, Y . Han, and W. Wang, “An end-to-end deep learning image compression framework based on semantic analysis,” Applied Sciences, 2019

  15. [15]

    Wireless semantic communications for video conferencing,

    P. Jiang, C.-K. Wen, S. Jin, and G. Y . Li, “Wireless semantic communications for video conferencing,” IEEE Journal on Selected Areas in Communications , vol. 41, pp. 230–244, 2022

  16. [16]

    Perfor- mance evaluation of semantic video compression using multi-cue object detection,

    N. M. AL-Shakarji, F. Bunyak, H. Aliakbarpour, G. Seetharaman, and K. Palaniappan, “Perfor- mance evaluation of semantic video compression using multi-cue object detection,” in IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–8, 2019

  17. [17]

    GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models,

    A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models,” inInternational Conference on Machine Learning (ICML) , 2021

  18. [18]

    CoBIT: A contrastive bi-directional image-text generation model,

    H. You, M. Guo, Z. Wang, K.-W. Chang, J. Baldridge, and J. Yu, “CoBIT: A contrastive bi-directional image-text generation model,” ArXiv preprint: ArXiv:2303.13455, 2023

  19. [19]

    Optimal transport in diffusion modeling for conversion tasks in audio domain,

    V . Popov, A. Amatov, M. Kudinov, V . Gogoryan, T. Sadekova, and I. V ovk, “Optimal transport in diffusion modeling for conversion tasks in audio domain,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 1–5, 2023

  20. [20]

    Make-An-Audio: Text-to-audio generation with prompt-enhanced diffusion models,

    R. Huang, J.-B. Huang, D. Yang, Y . Ren, L. Liu, M. Li, Z. Ye, J. Liu, X. Yin, and Z. Zhao, “Make-An-Audio: Text-to-audio generation with prompt-enhanced diffusion models,”ArXiv preprint: ArXiv:2301.12661, 2023. 10

  21. [21]

    Deep Audio Waveform Prior,

    A. Turetzky, T. Michelson, Y . Adi, and S. Peleg, “Deep Audio Waveform Prior,” inInterspeech, pp. 2938–2942, 2022

  22. [22]

    Make-A-Video: Text-to-Video Generation without Text-Video Data

    U. Singer, A. Polyak, T. Hayes, X. Yin, J. An, S. Zhang, Q. Hu, H. Yang, O. Ashual, O. Gafni, D. Parikh, S. Gupta, and Y . Taigman, “Make-A-Video: Text-to-video generation without text-video data,” ArXiv preprint: ArXiv:2209.14792, 2022

  23. [23]

    Seer: Language Instructed Video Prediction with Latent Diffusion Models

    X. Gu, C. Wen, J. Song, and Y . Gao, “Seer: Language instructed video prediction with latent diffusion models,”ArXiv preprint: ArXiv:2303.14897, 2023

  24. [24]

    Text2Performer: Text-driven human video generation,

    Y . Jiang, S. Yang, T. K. Liang, W. Wu, C. L. Change, and Z. Liu, “Text2Performer: Text-driven human video generation,” ArXiv preprint: ArXiv:2304.08483, 2023

  25. [25]

    Diffusion models in vision: A survey,

    F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , pp. 1–20, 2023

  26. [26]

    Diverse semantic image synthesis via probability distribution modeling,

    Z. Tan, M. Chai, D. Chen, J. Liao, Q. Chu, B. Liu, G. Hua, and N. Yu, “Diverse semantic image synthesis via probability distribution modeling,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7962–7971, 2021

  27. [27]

    Semantic image synthesis with spatially-adaptive normalization,

    T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2019

  28. [28]

    You only need adversarial supervision for semantic image synthesis,

    E. Schönfeld, V . Sushko, D. Zhang, J. Gall, B. Schiele, and A. Khoreva, “You only need adversarial supervision for semantic image synthesis,” in International Conference on Learning Representations (ICLR), 2021

  29. [29]

    Semantically multi-modal image synthesis,

    Z. Zhu, Z.-L. Xu, A. You, and X. Bai, “Semantically multi-modal image synthesis,”IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 5466–5475, 2020

  30. [30]

    Learning to predict layout-to-image conditional convolutions for semantic image synthesis,

    X. Liu, G. Yin, J. Shao, X. Wang, and H. Li, “Learning to predict layout-to-image conditional convolutions for semantic image synthesis,” in Advances in Neural Information Processing Systems (NeurIPS), 2019

  31. [31]

    Generative model based highly efficient semantic communication approach for image transmission,

    T. Han, J. Tang, Q. Yang, Y . Duan, Z. Zhang, and Z. Shi, “Generative model based highly efficient semantic communication approach for image transmission,” arXiv preprint: arXiv:2211.10287, 2022

  32. [32]

    Generative joint source-channel coding for semantic image transmission,

    E. Erdemir, T.-Y . Tung, P. L. Dragotti, and D. Gunduz, “Generative joint source-channel coding for semantic image transmission,” arXiv preprint: arXiv:2211.13772, 2022

  33. [33]

    V AE for joint source-channel coding of distributed gaussian sources over AWGN MAC,

    Y . Malur Saidutta, A. Abdi, and F. Fekri, “V AE for joint source-channel coding of distributed gaussian sources over AWGN MAC,” inIEEE Int. Workshop on Signal Processing Advances in Wireless Comm. (SPA WC), pp. 1–5, 2020

  34. [34]

    A variational auto-encoder approach for image transmission in wireless chan- nel,

    A. H. Estiri, M. R. Sabramooz, A. Banaei, A. H. Dehghan, B. Jamialahmadi, and M. J. Siavoshani, “A variational auto-encoder approach for image transmission in wireless chan- nel,” arXiv preprint: arXiv:2010.03967, 2020

  35. [35]

    Generative model based highly efficient semantic communication approach for image transmission,

    T. Han, J. Tang, Q. Yang, Y . Duan, Z. Zhang, and Z. Shi, “Generative model based highly efficient semantic communication approach for image transmission,”IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP) , 2022

  36. [36]

    Versatile diffusion: Text, images and variations all in one diffusion model,

    X. Xu, Z. Wang, E. Zhang, K. Wang, and H. Shi, “Versatile diffusion: Text, images and variations all in one diffusion model,”ArXiv preprint: ArXiv:2211.08332, 2022

  37. [37]

    Learning task-oriented communication for edge inference: An information bottleneck approach,

    J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,” IEEE Journal on Selected Areas in Communications , vol. 40, pp. 197–211, 2021

  38. [38]

    U-Net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI) , 2015

  39. [39]

    Searching for Activation Functions

    P. Ramachandran, B. Zoph, and Q. V . Le, “Swish: a self-gated activation function,” ArXiv preprint: ArXiv:1710.05941, 2017

  40. [40]

    Group normalization,

    Y . Wu and K. He, “Group normalization,”International Journal of Computer Vision, vol. 128, pp. 742–755, 2018

  41. [41]

    Improved denoising diffusion probabilistic models,

    A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” inInterna- tional Conference on Machine Learning (ICML) , pp. 8162—-8171, 2021. 11

  42. [42]

    Diffusion models beat gans on image synthesis,

    P. Dhariwal and A. Q. Nichol, “Diffusion models beat gans on image synthesis,” inAdvances in Neural Information Processing Systems (NeurIPS) , vol. 34, 2021

  43. [43]

    Classifier-free diffusion guidance,

    J. Ho and T. Salimans, “Classifier-free diffusion guidance,” inAdvances in Neural Information Processing Systems Workshops (NeurIPSW), 2021

  44. [44]

    Dilated residual networks,

    F. Yu, V . Koltun, and T. A. Funkhouser, “Dilated residual networks,”IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 636–644, 2017

  45. [45]

    Per-pixel classification is not all you need for semantic segmentation,

    B. Cheng, A. G. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” in Neural Information Processing Systems, 2021

  46. [46]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision (ECCV) , pp. 213–229, 2020

  47. [47]

    M4Depth: Monocular depth estimation for autonomous vehicles in unseen environments,

    M. Fonder, D. Ernst, and M. Van Droogenbroeck, “M4Depth: Monocular depth estimation for autonomous vehicles in unseen environments,”ArXiv preprint: ArXiv:2105.09847, 2021

  48. [48]

    Vision transformers for dense prediction,

    R. Ranftl, A. Bochkovskiy, and V . Koltun, “Vision transformers for dense prediction,”IEEE/CVF International Conference on Computer Vision (ICCV) , pp. 12159–12168, 2021

  49. [49]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,”The Bell system technical journal , vol. 27, no. 3, pp. 379–423, 1948

  50. [50]

    Recent contributions to the mathematical theory of communication,

    W. Weaver, “Recent contributions to the mathematical theory of communication,” ETC: A Review of General Semantics, pp. 261–281, 1953. 12 Conv SiLU GroupNorm GroupNorm Conv SiLU FCEncoder block Conv SiLU SPADE SPADE Conv SiLU FC Decoder block Figure 6: Encoder and decoder blocks of our U-Net-based semantic diffusion model. Supplementary Material From Techn...

  51. [51]

    It deals with the classical Shannon’s communication theory and focuses on the proper way of transmitting bits from a sender to a receiver

    The technical challenge. It deals with the classical Shannon’s communication theory and focuses on the proper way of transmitting bits from a sender to a receiver

  52. [52]

    Rather than just transmitting bits, this level should account for properly transmitting the meaning of the messages the sender wants to communicate to the receiver

    The semantic challenge. Rather than just transmitting bits, this level should account for properly transmitting the meaning of the messages the sender wants to communicate to the receiver

  53. [53]

    This level deals with the efficiency of the transmission of previous levels

    The effectiveness challenge. This level deals with the efficiency of the transmission of previous levels. With the upcoming advent of the sixth generation (6G), a radical rethinking of communication framework design has started, sliding from the first to the second level of Weaver’s theory [10, 1]. In this switch, generative learning methods are making th...