arxiv: 2605.12552 · v1 · submitted 2026-05-11 · 💻 cs.NI · eess.SP

Recognition: 2 theorem links

· Lean Theorem

DQN-Driven Adaptive Neighbor Discovery for Directional Aerial Networks

Elizabeth Bentley, Md Asif Ishrak Sarder, Murat Yuksel

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3

classification 💻 cs.NI eess.SP

keywords DQNNeighbor DiscoveryDirectional AntennasAerial NetworksPrivacy Trade-offReinforcement LearningMobile Wireless Networks

0 comments

The pith

DQN agents let directional aerial nodes adapt probing to balance reachability against privacy using only local observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a protocol where each node runs its own Deep Q-Network agent to choose transceiver patterns during neighbor discovery. Agents learn from local feedback alone in a mobile aerial setting and follow a weighted objective that trades off the number of neighbors found against the amount of exposure created. This setup addresses the practical difficulty that directional antennas require precise beam alignment while wireless links must also respect privacy limits. When the weights favor discovery the agents achieve higher probing efficiency and reachability; when the weights favor privacy they keep exposure low even if fewer neighbors are contacted. In both cases the learned policies produce higher overall objective values than random selection or standard Q-learning.

Core claim

Independent DQN agents, each acting on its own local observations, learn to tune directional probing patterns so that a weighted combination of reachability and limited exposure is maximized. The resulting policies outperform both random selection and tabular Q-learning baselines, with discovery-weighted agents delivering higher efficiency and connectivity and privacy-weighted agents delivering lower exposure at the cost of reduced reachability while still attaining a superior objective value.

What carries the argument

Independent DQN agents that select transceiver configurations using a weighted reward combining reachability and exposure, driven solely by local observations in a mobile directional network.

If this is right

Discovery-weighted policies increase the fraction of neighbors reached and the efficiency of each probe.
Privacy-weighted policies reduce the number of exposed transmissions while still producing a higher net objective score.
The same local-observation framework works under continuous node mobility without requiring synchronized clocks or shared maps.
Performance gains hold against both non-learning random selection and standard tabular Q-learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same independent-agent structure could be tested in terrestrial directional networks where mobility is slower but density is higher.
Adding a small amount of shared state among nearby agents might further improve convergence speed without violating the local-observation premise.
Real-world validation would require measuring actual beam-alignment latency and RF exposure in outdoor drone experiments.
The weighted-objective approach offers a direct way to encode regulatory exposure limits as tunable parameters inside the learning loop.

Load-bearing premise

Each node can learn an effective probing policy from its own local observations without needing global network state or explicit coordination with other nodes.

What would settle it

A controlled simulation or flight test in which DQN agents produce lower objective values than the random or Q-learning baselines across the same mobility traces and weight settings.

Figures

Figures reproduced from arXiv: 2605.12552 by Elizabeth Bentley, Md Asif Ishrak Sarder, Murat Yuksel.

**Figure 1.** Figure 1: The block diagram of Deep Q-Network (DQN) framework V. DRL FOR ADAPTIVE TRANSCEIVER SELECTION For problem (P) that couples two conflicting goals, it is difficult to find a closed-form solution since the environment is nonstationary and each node only has a partial observability of the entire network gathered from recent probings. Therefore, we adopt a model-free DRL framework i.e., Independent Deep QNetw… view at source ↗

**Figure 2.** Figure 2: Performance comparison of reachability and privacy breach occurrence between Random, Q-Learning, and DQN algorithms for different weights 0 2500 5000 7500 10000 12500 15000 17500 20000 NDM Interval (t) 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 Avg. Probing Efficiency (PE) w = 0.9 w = 0.5 w = 0.1 (a) Probing efficiency (PE) over time 0 2500 5000 7500 10000 12500 15000 17500 20000 NDM Interval (t) 0.2 … view at source ↗

**Figure 3.** Figure 3: Performance comparison of DQN-based framework for different weights VI. SIMULATION RESULTS AND DISCUSSION We develop a Python-based simulation environment where the nodes, each representing a UAV, are deployed in a 100m×100m grid. Both the transmission range (R) of directional nodes and the detection range (Rd) of undesired users are bounded by 30m. Other parameters are charted in Table II. We compare our … view at source ↗

**Figure 4.** Figure 4: Average objective and reward trend of DQN agents for different weights overheard due to its non-adaptive probing behavior [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Directional antenna systems are gaining substantial traction for aerial networks due to their higher gain, extended transmission range, and enhanced security. However, the requirement of beam alignment makes the task of finding and reaching neighbors challenging, particularly in a mobile setting. For wireless networks, privacy concerns play an equally critical role. However, the problem of ensuring network-wide connectivity while maintaining limited exposure when probing around is still unexplored. We address this trade-off by proposing an adaptive transceiver selection protocol based on the Deep Q-Network (DQN) framework. Each node acts as an independent DQN agent and interacts with the environment to learn how to balance the trade-off. Since the directional nodes operate only based on local observations, we adopt a weighted mechanism that guides them in prioritizing either high reachability or privacy by adaptively tuning the probing patterns. Results show that DQN framework surpasses the Random and Q-Learning baselines. Weights favoring discovery provide higher probing efficiency and reachability, while weights prioritizing privacy ensure limited exposure at the cost of low reachability, eventually attaining higher objective value.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DQN with a weighted reachability-privacy objective for directional aerial neighbor discovery is a sensible engineering step but the abstract supplies zero numbers or setup details, so the claimed gains over baselines cannot be checked.

read the letter

The paper takes DQN and uses it for adaptive transceiver selection in mobile directional aerial networks, letting each node tune its probing to balance how many neighbors it finds against how much it exposes itself. The weighted objective and local-observation setup match the decentralized, mobile constraints of the setting, and the description of how discovery-heavy versus privacy-heavy weights shift the outcomes is straightforward and useful. That combination of elements does not appear in the baselines they cite, so the specific protocol is new enough to note. The comparison to random probing and tabular Q-learning is a reasonable starting point for showing the approach can learn something better than naive methods. The central limitation is that the abstract asserts superiority without any quantitative results, network sizes, mobility models, simulation parameters, or statistical tests, which leaves the performance claims unverified from the given text. A second issue is the non-stationarity that arises when every node runs its own DQN and changes its policy at the same time; standard DQN with experience replay has no built-in fix for that, and the description does not mention centralized training, opponent modeling, or stabilization steps that would make the learned policies reliable once deployed. If the full simulations control for these factors and report concrete metrics with error bars, the work is worth referee time for the aerial-networks community. Otherwise the claims rest on simulation outcomes that cannot be assessed. I would bring it to a reading group only if the full paper supplies the missing numbers and addresses the multi-agent stability question. I would not cite it yet. A serious editor should send it to review rather than desk-reject, provided the simulations hold up under scrutiny.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a DQN-based adaptive transceiver selection protocol for neighbor discovery in directional aerial networks. Each node functions as an independent DQN agent that learns probing patterns from local observations to balance reachability against privacy via tunable weights; the authors report that the DQN approach outperforms Random and tabular Q-Learning baselines, with discovery-weighted policies yielding higher efficiency and reachability while privacy-weighted policies limit exposure at the cost of lower reachability, ultimately producing higher objective values.

Significance. If the performance claims hold under rigorous multi-agent evaluation, the work would provide a practical learning-based method for managing the connectivity-security trade-off in mobile directional networks, an area of growing relevance for aerial systems. The use of independent agents with weighted objectives is a straightforward extension of DQN to this setting, but the absence of quantitative metrics, simulation parameters, and explicit handling of non-stationarity limits the strength of the contribution at present.

major comments (2)

[Abstract] Abstract: the central claim that the DQN framework 'surpasses the Random and Q-Learning baselines' is stated without any numerical results, network sizes, mobility models, episode counts, or statistical tests. Because the performance advantage is the primary evidence offered for the method, this omission prevents verification of the result and must be addressed with concrete tables or figures.
[Abstract] Abstract (multi-agent setup): each node is described as an independent DQN agent learning solely from local observations, yet all nodes adapt their probing patterns concurrently. This renders the environment non-stationary for every agent; standard DQN with experience replay has no built-in correction for this, and the manuscript supplies no mention of centralized training, opponent modeling, or stabilization techniques. The reported gains over baselines could therefore be artifacts of co-adaptation during training rather than robust per-agent policies.

minor comments (1)

[Abstract] The abstract refers to 'weighted mechanism' and 'objective value' without defining the precise weighting scheme or the mathematical form of the objective; adding these definitions early in the text would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and have revised the abstract and added discussion to strengthen the presentation of results and the multi-agent aspects.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the DQN framework 'surpasses the Random and Q-Learning baselines' is stated without any numerical results, network sizes, mobility models, episode counts, or statistical tests. Because the performance advantage is the primary evidence offered for the method, this omission prevents verification of the result and must be addressed with concrete tables or figures.

Authors: We agree that the abstract requires concrete quantitative support to substantiate the performance claims. In the revised manuscript, we have updated the abstract to include key numerical results drawn from our simulations (e.g., relative objective-value improvements, reachability and privacy metrics), along with simulation parameters such as network sizes, the random-waypoint mobility model, episode counts, and references to the corresponding tables and figures in Section IV that contain the statistical comparisons. revision: yes
Referee: [Abstract] Abstract (multi-agent setup): each node is described as an independent DQN agent learning solely from local observations, yet all nodes adapt their probing patterns concurrently. This renders the environment non-stationary for every agent; standard DQN with experience replay has no built-in correction for this, and the manuscript supplies no mention of centralized training, opponent modeling, or stabilization techniques. The reported gains over baselines could therefore be artifacts of co-adaptation during training rather than robust per-agent policies.

Authors: This is a valid point about non-stationarity arising from concurrent independent learning. The original manuscript indeed does not discuss stabilization techniques or centralized training. In the revision we have added a paragraph in the methods and discussion sections that explicitly acknowledges the non-stationary environment, describes the independent DQN training procedure used, and reports that policies converged consistently across random seeds and weight settings while still outperforming the baselines. We note that the empirical robustness supports the practical utility of the approach, though we agree that more advanced multi-agent RL methods could be explored in future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical simulation results independent of inputs

full rationale

The paper describes a DQN-based protocol for balancing reachability and privacy in directional aerial networks via simulation, with each node as an independent agent using local observations and a weighted objective. No equations, derivations, or first-principles claims are present that reduce any result to fitted parameters or self-citations by construction. Performance claims (DQN outperforming Random and tabular Q-Learning) rest on reported simulation outcomes rather than any self-definitional loop, fitted-input prediction, or load-bearing self-citation chain. The approach is self-contained as an application of standard DQN, with results not equivalent to inputs by definition.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard reinforcement-learning assumptions plus the domain claim that local observations suffice for policy learning in this setting.

free parameters (1)

reachability-vs-privacy weights
Adaptively tuned parameters that control whether agents prioritize probing efficiency or limited exposure.

axioms (1)

domain assumption Independent DQN agents can learn effective policies from local observations alone in a mobile directional network.
Invoked to justify decentralized operation without global state.

pith-pipeline@v0.9.0 · 5482 in / 1121 out tokens · 40487 ms · 2026-05-14T22:04:11.352277+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each node acts as an independent DQN agent... weighted mechanism... O(t) = w·Op(t) + (1-w)·Oc(t)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DQN framework surpasses the Random and Q-Learning baselines

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Flying ad-hoc network covert communications with deep reinforcement learning,

Z. Li, J. Wang, J. Chen, Z. Fang, and Y . Ren, “Flying ad-hoc network covert communications with deep reinforcement learning,”IEEE Wireless Communications, vol. 31, no. 5, pp. 117–125, 2024

work page 2024
[2]

Uav trajectory optimization for air-to-water optical wireless channels,

S. S. Reza and M. Yuksel, “Uav trajectory optimization for air-to-water optical wireless channels,” inInternational Conference on Computer Communications and Networks (ICCCN). IEEE, 2024, pp. 1–9

work page 2024
[3]

On neighbor discovery in wireless networks with directional antennas,

S. Vasudevan, J. Kurose, and D. Towsley, “On neighbor discovery in wireless networks with directional antennas,” inProceedings of IEEE INFOCOM, vol. 4, 2005, pp. 2502–2512

work page 2005
[4]

Impact of receiver antenna polarization and resource scheduler on the downlink performance of high velocity users in 5g millimeter wave small cell technology,

F. Tajrian, M. A. I. Sarder, M. S. Anzum, M. Rafique, and A. Bin Shams, “Impact of receiver antenna polarization and resource scheduler on the downlink performance of high velocity users in 5g millimeter wave small cell technology,” in2021 International Conference on Electronics, Communications and Information Technology (ICECIT), 2021, pp. 1–4

work page 2021
[5]

Intelligent beam configuration for neighbor discovery in ad hoc networks with directional antennas,

J. Wang, G. Feng, S. Qin, Y . Liu, J. Zhou, and Y . Peng, “Intelligent beam configuration for neighbor discovery in ad hoc networks with directional antennas,” inProceedings of IEEE ICC, 2023, pp. 1843–1849

work page 2023
[6]

Robust multi-beam secure mmWave wireless communication for hybrid wiretapping systems,

B. Qiu, W. Cheng, and W. Zhang, “Robust multi-beam secure mmWave wireless communication for hybrid wiretapping systems,”IEEE Tran. on Information F orensics and Security, vol. 18, pp. 1393–1406, 2023

work page 2023
[7]

Eavesdropper-avoiding neighbor discovery for multi-sector directional wireless systems,

M. A. Ishrak Sarder, M. Yuksel, and E. Bentley, “Eavesdropper-avoiding neighbor discovery for multi-sector directional wireless systems,” in Proceedings of IEEE ICCCN, 2024, pp. 1–9

work page 2024
[8]

Effective transceiver selection for mobile multi-directional free-space-optical modules,

A. Sevincer and M. Yuksel, “Effective transceiver selection for mobile multi-directional free-space-optical modules,” inIEEE Wireless Commu- nications and Networking Conference (WCNC), 2014, pp. 2988–2993

work page 2014
[9]

Randomized 3d neighbor discovery with mechanically steered fso transceivers,

Z. E. M. Syed and M. Yuksel, “Randomized 3d neighbor discovery with mechanically steered fso transceivers,” in2024 IEEE Military Communications Conference (MILCOM), 2024, pp. 342–347

work page 2024
[10]

Efficient neighbour discovery algorithm for maritime mesh networks with directional antennas,

S. H. Kumar and W. K. Seah, “Efficient neighbour discovery algorithm for maritime mesh networks with directional antennas,” inProc. of IEEE International Conference on ITS Telecommunications, 2008, pp. 6–11

work page 2008
[11]

Expedited neighbor discovery in directional terahertz communication networks enhanced by antenna side-lobe infor- mation,

Q. Xia and J. M. Jornet, “Expedited neighbor discovery in directional terahertz communication networks enhanced by antenna side-lobe infor- mation,”IEEE Transactions on V ehicular Technology, vol. 68, no. 8, pp. 7804–7814, 2019

work page 2019
[12]

A fast neighbor discovery algorithm based on q-learning in wireless ad hoc networks with directional antennas,

Y . Wang, L. Peng, R. Xu, Y . Yang, and L. Ge, “A fast neighbor discovery algorithm based on q-learning in wireless ad hoc networks with directional antennas,” inIEEE International Conference on Computer and Communications (ICCC), 2020, pp. 467–472

work page 2020
[13]

Adaptive neighbor discovery scheme for directional ad hoc network,

J. Gao, H. Tang, C. Sui, L. Liu, and R. Wang, “Adaptive neighbor discovery scheme for directional ad hoc network,” inProceedings of the 2021 9th International Conference on Communications and Broadband Networking, ser. ICCBN ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 222–226

work page 2021
[14]

Enhanced reinforcement learning-based two-way transmit-receive directional anten- nas neighbor discovery in wireless ad hoc networks,

Z. Wei, H. Wu, Z. Lin, Q. Wen, L. Zheng, J. Wen, and H. Liu, “Enhanced reinforcement learning-based two-way transmit-receive directional anten- nas neighbor discovery in wireless ad hoc networks,”Ad Hoc Networks, vol. 167, p. 103689, 2025

work page 2025
[15]

A neighbor discovery protocol for directional antenna networks,

G. Pei, M. Albuquerque, J. H. Kim, D. P. Nast, and P. R. Norris, “A neighbor discovery protocol for directional antenna networks,” inProc. of IEEE Military Communications Conference, 2005, pp. 487–492

work page 2005
[16]

Rex: A randomized exclusive region based scheduling scheme for mmWave WPANs with directional antenna,

L. X. Cai, L. Cai, X. Shen, and J. W. Mark, “Rex: A randomized exclusive region based scheduling scheme for mmWave WPANs with directional antenna,”IEEE Transactions on Wireless Communications, vol. 9, no. 1, pp. 113–121, 2010

work page 2010
[17]

Spatial reuse strategy in mmwave wpans with directional antennas,

Q. Chen, X. Peng, J. Yang, and F. Chin, “Spatial reuse strategy in mmwave wpans with directional antennas,” inProc. of IEEE Global Communications Conference (GLOBECOM), 2012, pp. 5392–5397

work page 2012
[18]

A survey of mobility models for ad hoc network research,

T. Camp, J. Boleng, and V . Davies, “A survey of mobility models for ad hoc network research,”Wireless Communications and Mobile Computing, vol. 2, no. 5, pp. 483–502, 2002

work page 2002
[19]

NR; NR and NG-RAN overall description,

3GPP, “NR; NR and NG-RAN overall description,” Technical Specifica- tion (TS) 38.300, 3rd Generation Partnership Project, 2022

work page 2022
[20]

Investigation of inde- pendent reinforcement learning algorithms in multi-agent environments,

K. M. Lee, S. G. Subramanian, and M. Crowley, “Investigation of inde- pendent reinforcement learning algorithms in multi-agent environments,”

work page
[21]

Available: https://arxiv.org/abs/2111.01100

[Online]. Available: https://arxiv.org/abs/2111.01100

work page arXiv