arxiv: 2605.08039 · v1 · submitted 2026-05-08 · 📡 eess.SP

Recognition: 2 theorem links

· Lean Theorem

Joint Beamforming and Antenna Placement Optimization in Pinching Antenna Systems with User Mobility: A Deep Reinforcement Learning Approach

Ali Amhaz , Mohamed Elhattab , Chadi Assi , Sanaa Sharafeddine

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:03 UTC · model grok-4.3

classification 📡 eess.SP

keywords pinching antenna systemsdeep reinforcement learningDDPGjoint beamforming optimizationuser mobilitywireless communicationsQoS constraints

0 comments

The pith

Deep reinforcement learning jointly tunes beamforming and pinching locations to maximize average sum rate for mobile users in pinching antenna systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the challenge of user mobility in pinching antenna systems where optimal pinching points along waveguides must be adjusted continuously as the user moves. It formulates a non-convex optimization problem to maximize the user's average sum rate over a time horizon while meeting quality-of-service constraints by jointly choosing the base station beamforming vector and the pinching locations. Conventional methods fail due to variable coupling and randomness from mobility plus probabilistic blockage. The authors apply a deep deterministic policy gradient reinforcement learning agent suited to continuous spaces to learn effective policies that adapt in real time.

Core claim

The central claim is that a deep deterministic policy gradient algorithm within a reinforcement learning framework solves the joint beamforming and pinching location optimization in a PASS serving a mobile user, delivering higher average sum rates under QoS constraints despite mobility-induced randomness and blockage.

What carries the argument

A deep deterministic policy gradient (DDPG) reinforcement learning agent that treats the joint beamforming vector and pinching locations as continuous actions and learns policies to maximize long-term sum rate reward.

If this is right

The learned policies enable continuous reconfiguration of pinching points as the user moves, directly increasing average sum rate compared to static configurations.
The approach handles the non-convex coupling between beamforming and pinching variables that defeats traditional convex optimization.
Real-time configurability becomes essential for realizing PASS performance gains under mobility and blockage.
The method extends to time-horizon objectives rather than single-slot optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the RL policies generalize across different mobility patterns, PASS could support reliable coverage in dynamic settings such as vehicular or pedestrian scenarios without manual tuning.
Extending the single-user model to multiple mobile users would require checking whether the same DDPG framework scales or needs multi-agent modifications.
The emphasis on trajectory tracking suggests that pairing PASS with accurate mobility prediction models could further reduce the learning burden on the agent.

Load-bearing premise

The reinforcement learning agent can discover effective policies for real-time adjustment despite user mobility and probabilistic blockage without the paper specifying exact state representations or reward designs that would guarantee this learning.

What would settle it

A controlled simulation showing that the learned DDPG policy fails to satisfy QoS constraints or drops below conventional fixed-pinching performance when user speed or blockage probability increases beyond the training distribution would disprove the effectiveness claim.

Figures

Figures reproduced from arXiv: 2605.08039 by Ali Amhaz, Chadi Assi, Mohamed Elhattab, Sanaa Sharafeddine.

**Figure 1.** Figure 1: System Model performance. The combination of these effects together yields nonstationary channels, making it difficult to determine the optimal pinching point configurations over time, degrading the overall performance. As a result, it is essential that the system quickly adapt the pinching placement and beamforming vectors. In light of this and to the best of our knowledge, no work has studied the user m… view at source ↗

**Figure 2.** Figure 2: Convergence of the solution approach. . tuple (s (t) , a(t) , r(t+1), s(t+1)) is added to the replay buffer F to be stored. Next, a random batch of the experiences is then sampled from the replay buffer to update the parameters of the different networks. After completion of the training procedure, the learned model is saved and will be utilized to generate the results during the testing phase. VI. Computat… view at source ↗

**Figure 3.** Figure 3: BS power vs average rate. . 0 5 10 15 20 25 X Position (m) 0 5 10 15 20 25 Y Position (m) User Path Time 1 (Rth = 1bps/Hz) Time 2 (Rth = 1bps/Hz) Time 3 (Rth = 1bps/Hz) Time 1 (Rth = 2bps/Hz) Time 2 (Rth = 2bps/Hz) Time 3 (Rth = 2bps/Hz) Waveguide [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: User movement and pinching locations map. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Recently, the pinching antenna systems (PASS) have attracted significant attention due to their ability to exploit dynamically reconfigurable pinching points along waveguides for flexible signal transmission. However, existing work largely overlooks user mobility although the optimal pinching configuration is highly dependent on the user's location and must be continuously adjusted. In this work, we investigate a PASS-enabled system model in which a base station (BS) serves a mobile user. We formulate an optimization problem that aims to maximize the user's average sum rate over a predefined time horizon while satisfying quality-of-service (QoS) constraint. This objective is achieved by jointly optimizing the beamforming vector at the BS and the pinching locations along the waveguides. Nevertheless, the resulting problem is highly non-convex and challenging to solve using conventional optimization techniques due to the intricate coupling among variables. The difficulty is further exacerbated by environmental randomness arising from user mobility and a probabilistic blockage model. This reveals a key engineering challenge: the performance gains of PASS critically rely on the ability to track or predict user trajectories in real time. To address these challenges, we adopt a deep deterministic policy gradient (DDPG) approach within a reinforcement learning framework, which is well-suited for continuous state and action spaces. Finally, extensive simulations are conducted to validate the proposed approach and demonstrate the importance of real-time configurability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies DDPG to jointly optimize beamforming and pinching locations for a mobile user in a pinching antenna system, which is a reasonable new application but light on shown validation details.

read the letter

The main point is that this work fills a gap by adding user mobility to pinching antenna systems and using deep deterministic policy gradient to tune both the base station beamforming vector and the pinching points along the waveguides. The goal is to maximize average sum rate over a time horizon while meeting QoS, with randomness from mobility and probabilistic blockage thrown in. They note that prior PASS papers missed the need for continuous real-time adjustment, and DDPG fits the continuous action space, so the formulation tracks standard RL practice for this kind of non-convex stochastic problem.

Referee Report

2 major / 1 minor

Summary. The paper presents a PASS-enabled system with a BS serving a mobile user. It formulates a non-convex optimization problem to maximize the user's average sum rate over a time horizon subject to QoS constraints by jointly optimizing the BS beamforming vector and pinching locations along the waveguides. The problem is exacerbated by randomness from user mobility and a probabilistic blockage model. To solve it, the paper adopts a DDPG reinforcement learning approach suited to continuous state and action spaces, and conducts extensive simulations to validate the method and demonstrate the importance of real-time configurability.

Significance. If the DDPG approach is properly validated, the work would be significant for showing how off-policy actor-critic methods can address joint continuous optimization in dynamic reconfigurable-antenna systems where mobility makes static configurations suboptimal. It correctly identifies real-time trajectory tracking as a key engineering challenge for PASS gains and applies a standard formulation for continuous-action MDPs. Credit is due for framing the problem as an MDP that incorporates environmental randomness rather than assuming perfect knowledge.

major comments (2)

[Abstract] Abstract: the claim that 'extensive simulations are conducted to validate the proposed approach' is unsupported by any quantitative results, error bars, baseline comparisons, or description of how the non-convex coupling and probabilistic blockage are modeled inside the RL environment; this gap is load-bearing for the validation of the central claim.
[Proposed Approach] The formulation assumes the RL agent can learn effective policies despite randomness from mobility and probabilistic blockage, yet no details are given on state representation, action-space handling, or reward shaping that would make this feasible (see weakest assumption in the provided analysis).

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one key performance metric or baseline comparison to support the validation statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to improve clarity and completeness where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'extensive simulations are conducted to validate the proposed approach' is unsupported by any quantitative results, error bars, baseline comparisons, or description of how the non-convex coupling and probabilistic blockage are modeled inside the RL environment; this gap is load-bearing for the validation of the central claim.

Authors: We acknowledge that the abstract is concise and does not itself present quantitative details or explicit environment modeling. The manuscript contains a dedicated simulation section with results, baseline comparisons, and descriptions of how mobility and probabilistic blockage are incorporated. To address the concern directly, we will revise the abstract to include a brief summary of the validation approach and key simulation outcomes, along with a short statement on how the RL environment models the non-convex coupling and randomness. This change will make the central claim better supported within the abstract itself. revision: yes
Referee: [Proposed Approach] The formulation assumes the RL agent can learn effective policies despite randomness from mobility and probabilistic blockage, yet no details are given on state representation, action-space handling, or reward shaping that would make this feasible (see weakest assumption in the provided analysis).

Authors: We agree that additional implementation details are needed to demonstrate feasibility. In the revised manuscript we will expand the Proposed Approach section with: (i) state representation that includes user position, velocity estimates, and instantaneous channel gains; (ii) continuous action space consisting of normalized beamforming vectors and pinching locations with explicit bounds; and (iii) reward shaping that combines instantaneous sum rate with a QoS-violation penalty, where environmental randomness is handled through stochastic transitions sampled from the mobility and blockage models. These additions will clarify how the DDPG agent can learn robust policies under the stated uncertainties. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formulates a standard non-convex optimization problem maximizing time-horizon average sum rate subject to QoS constraints by jointly choosing beamforming vectors and pinching locations, then applies the established DDPG algorithm (an off-policy actor-critic method for continuous MDPs) to solve the resulting MDP. This is a direct, non-reductive application of a known RL technique to the stated engineering problem; the objective, state/action spaces, and reward are defined from the system model without reducing the claimed gains to a fitted parameter or self-citation chain. No self-definitional steps, imported uniqueness theorems, or ansatz smuggling appear in the abstract or formulation. The approach is self-contained against external RL benchmarks and simulation validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the model implicitly assumes a probabilistic blockage process and continuous user mobility trajectory that are not further specified.

pith-pipeline@v0.9.0 · 5549 in / 1352 out tokens · 30874 ms · 2026-05-11T02:03:30.532354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt a deep deterministic policy gradient (DDPG) approach... to jointly optimize the beamforming vector at the BS and the pinching locations... maximizing the user's average sum rate over a time horizon subject to QoS constraints.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The reward function is formulated... r(t) = R(t)_k + I1(R(t)_k - Rth)·pen1 - ...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

The Road Towards 6G: A Comprehensive Survey,

W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten, “The Road Towards 6G: A Comprehensive Survey,” IEEE Open Journal of the Communications Society, vol. 2, pp. 334–366, 2021

work page 2021
[2]

Massive MIMO: An Overview, Recent Challenges, and Future Research Directions,

T. Abood, I. Hburi, and H. F. Khazaal, “Massive MIMO: An Overview, Recent Challenges, and Future Research Directions,” in 2021 International Conference on Advance of Sustainable Engineering and its Application (ICASEA), 2021, pp. 43–48

work page 2021
[3]

Pinching Antennas: Principles, Applications and Challenges,

Z. Yang, N. Wang, Y. Sun, Z. Ding, R. Schober, G. K. Karagiannidis, V. W. Wong, and O. A. Dobre, “Pinching Antennas: Principles, Applications and Challenges,” IEEE Wireless Communications, vol. 33, no. 2, pp. 175–184, 2026

work page 2026
[4]

Rate Maximiza- tion for Downlink Pinching-Antenna Systems,

Y. Xu, Z. Ding, and G. K. Karagiannidis, “Rate Maximiza- tion for Downlink Pinching-Antenna Systems,” IEEE Wireless Communications Letters, vol. 14, no. 5, pp. 1431–1435, 2025

work page 2025
[5]

Pinching-Antenna Systems (PASS): A Tutorial,

Y. Liu, H. Jiang, X. Xu, Z. Wang, J. Guo, C. Ouyang, X. Mu, Z. Ding, A. Nallanathan, G. K. Karagiannidis, and R. Schober, “Pinching-Antenna Systems (PASS): A Tutorial,” IEEE Transactions on Communications, vol. 74, pp. 4881–4918, 2026

work page 2026
[6]

Coverage and Rate Analysis for Millimeter-Wave Cellular Networks,

T. Bai and R. W. Heath, “Coverage and Rate Analysis for Millimeter-Wave Cellular Networks,” IEEE Transactions on Wireless Communications, vol. 14, no. 2, pp. 1100–1114, 2015

work page 2015
[7]

DDPG-Based Wireless Resource Allocation for Time-Constrained Applications,

H. Hu, M. Hernandez, Y. G. Kim, K. J. Ahmed, K. Tsukamoto, and M. J. Lee, “DDPG-Based Wireless Resource Allocation for Time-Constrained Applications,” in 2024 IEEE Wireless Communications and Networking Conference (WCNC), 2024, pp. 1–6

work page 2024
[8]

Waveguide divi sion multiple access for pinching-antenna systems (PASS),

J. Zhao, X. Mu, K. Cai, Y. Zhu, and Y. Liu, “Waveguide Division Multiple Access for Pinching- Antenna Systems (PASS),” 2025. [Online]. A vailable: https://arxiv.org/abs/2502.17781

work page arXiv 2025
[9]

Modeling and Beamforming Optimization for Pinching-Antenna Systems,

Z. Wang, C. Ouyang, X. Mu, Y. Liu, and Z. Ding, “Modeling and Beamforming Optimization for Pinching-Antenna Systems,” 2025. [Online]. A vailable: https://arxiv.org/abs/2502.05917

work page arXiv 2025
[10]

Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,

C. Huang, R. Mo, and C. Yuen, “Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020

work page 2020
[11]

Deep Deterministic Policy Gradient-Based Rate Maximization for RIS-UA V-Assisted Vehicular Communication Networks,

H. Zhao et al., “Deep Deterministic Policy Gradient-Based Rate Maximization for RIS-UA V-Assisted Vehicular Communication Networks,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–13, 2024

work page 2024
[12]

En- ergy Consumption Optimization in RIS-Assisted Cooperative RSMA Cellular Networks,

S. Khisa, M. Elhattab, C. Assi, and S. Sharafeddine, “En- ergy Consumption Optimization in RIS-Assisted Cooperative RSMA Cellular Networks,” IEEE Transactions on Communi- cations, vol. 71, no. 7, pp. 4300–4312, 2023

work page 2023