arxiv: 2605.01798 · v1 · submitted 2026-05-03 · 💻 cs.MM

Recognition: unknown

Contextual Wireless Video Semantic Communication in MIMO-OFDM Systems

Bingyan Xie, Biqian Feng, Cong Zhou, Wenjun Zhang, Yongpeng Wu, Yuxuan Shi

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:53 UTC · model grok-4.3

classification 💻 cs.MM

keywords semantic communicationMIMO-OFDMvideo transmissioncontext-subcarrier correlationrecursive samplingmulti-path channelsentropy codingchannel state information

0 comments

The pith

M-CVST aligns video feature context to MIMO subcarriers and uses recursive sampling of past channel data to improve semantic video transmission over multi-path channels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework for sending video semantically over MIMO-OFDM wireless links that matches semantic features from the video to specific groups of subcarriers. It exploits the fact that multi-path channels stay correlated over time by sampling subcarriers recursively and feeding prior channel information into the entropy coder. Simulations show this yields lower distortion than both conventional separate coding and other semantic methods under the same channel conditions. A reader would care because the approach suggests a practical way to make high-quality video delivery more reliable in real wireless environments where full channel knowledge is hard to maintain.

Core claim

By constructing a context-subcarrier correlation map that pairs video feature context with groups of MIMO subcarriers and pairing it with a recursive subcarrier sampling method that embeds time-correlated reference information, the M-CVST system improves channel state awareness inside the entropy coding model and thereby achieves superior reconstruction quality over multi-path MIMO channels compared with other semantic and traditional separated transmission schemes.

What carries the argument

The context-subcarrier correlation map that aligns video feature context with groups of MIMO subcarriers, together with recursive subcarrier sampling that re-uses time-correlated reference embeddings from prior samples.

Load-bearing premise

The context-subcarrier correlation map and recursive sampling method can be realized with modest overhead and that simulation results will translate to performance gains in actual time-varying multi-path MIMO channels.

What would settle it

Measurements in a live multi-path MIMO testbed where M-CVST shows no reduction in video distortion relative to a well-tuned separated source-channel scheme at the same rate and SNR would falsify the claimed superiority.

Figures

Figures reproduced from arXiv: 2605.01798 by Bingyan Xie, Biqian Feng, Cong Zhou, Wenjun Zhang, Yongpeng Wu, Yuxuan Shi.

**Figure 1.** Figure 1: (a) The proposed M-CVST framework for uplink wireles view at source ↗

**Figure 2.** Figure 2: (a) The recursive subcarrier sampling for time-corr view at source ↗

**Figure 3.** Figure 3: (a)-(c) Quality of the reconstructed images versus t view at source ↗

**Figure 4.** Figure 4: Quality of the reconstructed images versus the CBRs u view at source ↗

read the original abstract

This paper proposes a MIMO-OFDM-based context video semantic transmission framework, namely M-CVST, for robust video communication over multi-path multiple-input multiple-output (MIMO) channels. It introduces a context-subcarrier correlation map that aligns video feature context with groups of MIMO subcarriers. To leverage the time-correlated nature of multi-path channels, a recursive subcarrier sampling method paired with time-correlated reference embedding is designed, enabling the use of previously sampled MIMO subcarrier CSI to enhance channel state awareness in the entropy coding model. Numerical results verify the superiority of proposed M-CVST over MIMO multi-path channels compared to other semantic schemes and traditional separated schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

M-CVST pairs a context-subcarrier correlation map with recursive CSI reuse for semantic video in MIMO-OFDM, but the reported gains rest on thin simulation details that may not survive higher mobility.

read the letter

The main thing here is that the authors have built M-CVST around a context-subcarrier correlation map that lines up video feature context with groups of MIMO subcarriers, plus a recursive subcarrier sampling step that feeds time-correlated prior CSI into the entropy coder. That specific combination is new even if the broader semantic-communication and MIMO-OFDM pieces are not. The paper does a reasonable job of making the semantic encoder channel-aware without demanding fresh full CSI on every frame, which is a practical direction for video over multi-path links. The recursive embedding idea is a clean way to exploit the natural time correlation in MIMO channels and cut overhead. On the positive side, the motivation is clear and the mechanisms follow logically from the problem of aligning semantic content with the physical subcarrier structure. The soft spot is the evaluation. The abstract claims numerical superiority over other semantic schemes and traditional separated transmission, yet gives no simulation parameters, mobility speeds, channel models, baselines, or error bars. The recursive sampling only helps if channel realizations stay correlated across the sampling window; if the tests used low-Doppler traces, the advantage could shrink or vanish once coherence time drops. That concern from the stress-test note still looks live until the full methods are shown. This paper is aimed at people working on semantic communications for wireless multimedia. A reader already following MIMO-OFDM semantic work would get some concrete mechanisms to consider or extend. It shows honest engagement with the literature and the equations are not circular, so it qualifies as serious thinking. I would send it to peer review rather than desk-reject it, with the main request being expanded experiments across mobility regimes and clearer reporting of the simulation setup.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes M-CVST, a MIMO-OFDM-based context video semantic transmission framework. It introduces a context-subcarrier correlation map to align video feature context with groups of MIMO subcarriers and a recursive subcarrier sampling method paired with time-correlated reference embedding to reuse prior MIMO CSI in the entropy coder, exploiting time-correlated multi-path channels. Numerical results are presented to claim superiority over other semantic schemes and traditional separated schemes.

Significance. If the numerical gains prove robust, the framework could contribute to semantic communications by combining context alignment with channel-state reuse in MIMO-OFDM, potentially improving rate-distortion performance for video under bandwidth-limited wireless conditions. The recursive sampling idea directly addresses time-varying channels, which is a relevant direction, though its value hinges on validation beyond idealized correlation assumptions.

major comments (2)

[Abstract] Abstract: the superiority claim rests entirely on numerical results, yet no simulation parameters, mobility models (Doppler spread or coherence time), baselines, error bars, or statistical significance tests are described. Without these, it is impossible to determine whether reported gains over semantic and separated schemes are reproducible or regime-specific.
[Recursive subcarrier sampling method] Recursive subcarrier sampling method (as described in the abstract): the approach assumes multi-path channel realizations remain sufficiently correlated across the recursive window so that stale CSI improves entropy coding. In MIMO-OFDM, coherence time is governed by Doppler spread; no analysis or results are provided for high-mobility regimes where coherence time falls below the sampling interval, raising the risk that the claimed advantage is an artifact of low-mobility traces.

minor comments (1)

[Abstract] The abstract would be clearer if it explicitly named the performance metrics (e.g., PSNR, semantic similarity, throughput) used to demonstrate superiority.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the superiority claim rests entirely on numerical results, yet no simulation parameters, mobility models (Doppler spread or coherence time), baselines, error bars, or statistical significance tests are described. Without these, it is impossible to determine whether reported gains over semantic and separated schemes are reproducible or regime-specific.

Authors: We agree that the abstract, due to its brevity, omits these details. We will revise the abstract to include key simulation parameters such as the MIMO-OFDM setup, mobility model with Doppler spread values, the baselines considered, and a note that results are averaged over multiple runs with error bars shown in the figures. We will also add statistical significance tests to the results section in the revised manuscript to strengthen the claims. revision: yes
Referee: [Recursive subcarrier sampling method] Recursive subcarrier sampling method (as described in the abstract): the approach assumes multi-path channel realizations remain sufficiently correlated across the recursive window so that stale CSI improves entropy coding. In MIMO-OFDM, coherence time is governed by Doppler spread; no analysis or results are provided for high-mobility regimes where coherence time falls below the sampling interval, raising the risk that the claimed advantage is an artifact of low-mobility traces.

Authors: We agree that the recursive sampling method relies on sufficient channel correlation over the window and that the manuscript does not analyze high-mobility cases where coherence time is short. We will add a discussion of the coherence time assumption in the revised manuscript along with new numerical results for higher Doppler spread values to show the performance boundary and when the advantage of time-correlated reference embedding holds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on numerical verification of proposed framework

full rationale

The paper introduces M-CVST with a context-subcarrier correlation map and recursive subcarrier sampling to leverage channel time-correlation, but the abstract and available text contain no equations, derivations, or self-referential definitions that reduce any result to its inputs by construction. The superiority claim is explicitly tied to numerical results comparing against baselines, which constitutes independent empirical verification rather than a tautological fit or self-citation chain. No load-bearing steps match the enumerated circularity patterns, and the derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents exhaustive identification; the framework rests on domain assumptions about channel correlation and the utility of semantic features.

axioms (1)

domain assumption Multi-path MIMO-OFDM channels exhibit sufficient time correlation to make past CSI useful for current entropy coding
Invoked to justify the recursive sampling method

invented entities (1)

context-subcarrier correlation map no independent evidence
purpose: Align video feature context with groups of MIMO subcarriers
New construct introduced to enable the framework

pith-pipeline@v0.9.0 · 5414 in / 1327 out tokens · 33671 ms · 2026-05-09T15:53:23.315959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 2 canonical work pages

[1]

Overview of t he high efﬁciency video coding (HEVC) standard,

G. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of t he high efﬁciency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649-1668, Dec. 2012

2012
[2]

Overview of the versatile video cod ing (VVC) standard and its applications,

B. Benjamin, et al., “Overview of the versatile video cod ing (VVC) standard and its applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no.10, pp. 3736-3764, Aug. 2021

2021
[3]

Wireless Video Semantic Communication wi th Decoupled Diffusion Multi-frame Compensation,

B. Xie et al., “Wireless Video Semantic Communication wi th Decoupled Diffusion Multi-frame Compensation,” IEEE Trans. Commun. , vol. 74, pp. 987-1002, Nov. 2025

2025
[4]

Compression Ratio Allocation for Probab ilistic Semantic Communication With RSMA,

Z. Zhao et al., “Compression Ratio Allocation for Probab ilistic Semantic Communication With RSMA,” IEEE Trans. Commun. , vol. 73, no. 9, pp. 7304-7318, Sept. 2025

2025
[5]

Deep learning ena bled video semantic transmission against multi-dimensional no ise,

H. Niu, L. Wang, Z. Lu, K. Du, and X. Wen, “Deep learning ena bled video semantic transmission against multi-dimensional no ise,” in Proc. IEEE Glob. Commun. Conf. W orkshops (GLOBECOM W orkshops) , Kuala Lumpur, Malaysia, pp. 1267-1272, Dec. 2023

2023
[6]

Wireless Deep Video Semantic Transmissi on,

S. Wang et al., “Wireless Deep Video Semantic Transmissi on,” IEEE J. Select. Areas Commun. , vol. 41, no. 1, pp. 214-229, Jan. 2023

2023
[7]

Context Video Semantic Transmission with V ariable Length and Rate Coding over MIMO Channels,

B. Xie et al., “Context Video Semantic Transmission with V ariable Length and Rate Coding over MIMO Channels,” Dec. 2025. [Onli ne]. Available: https://arxiv.org/abs/2601.06059

work page arXiv 2025
[8]

Common Test Conditions and Software Re ference Conﬁgurations,

F. Bossen et al., “Common Test Conditions and Software Re ference Conﬁgurations,” document JCTVC-L1100, vol. 12, no. 7, 2013

2013
[9]

Robu st image semantic coding with learnable CSI fusion masking ove r MIMO fading channels,

B. Xie, Y . Wu, Y . Shi, W. Zhang, S. Cui, and M. Debbah, “Robu st image semantic coding with learnable CSI fusion masking ove r MIMO fading channels,” IEEE Trans. Wireless Commun. , vol. 23, no. 10, pp. 14155-14170, Oct. 2024

2024
[10]

Sionna: An Open-Source Library for Next-Generation Physical Layer Research,

H., Jakob, et al., “Sionna: An open-source library for n ext-generation physical layer research,” Mar. 2022. [Online]. Available: https://arxiv. org/abs/2203.11854

work page arXiv 2022
[11]

VV enC: An open and optimized VVC encode r imple- mentation,

W. Adam, et al., “VV enC: An open and optimized VVC encode r imple- mentation,” in IEEE Int. Conf. Multimedia Expo W orkshops , Shenzhen, China, Jun. 2021

2021
[12]

Converting video formats with FFmpeg,

S. Tomar, “Converting video formats with FFmpeg,” Linux J., vol. 2006, no. 146, Jun. 2006

2006