A closer look at invalid action masking in policy gradient algorithms

Shengyi Huang, Santiago Ontañón · 2006 · arXiv 2006.14171

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency

quant-ph · 2026-05-12 · unverdicted · novelty 7.0

TuniQ uses RL with a dual-encoder, shaped rewards, and action masking to autotune quantum compilation passes, improving fidelity and speed over Qiskit while generalizing across backends and scaling to large circuits.

Your Loss is My Gain: Low Stake Attacks on Liquid Staking Pools

cs.GT · 2026-05-01 · unverdicted · novelty 7.0

A low-stake adversary can degrade a liquid staking pool's performance via consensus manipulation and profit from the resulting drop in its LST value through application-layer financial positions.

AlphaTransit: Learning to Design City-scale Transit Routes

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

AlphaTransit pairs MCTS with a learned policy-value network to reach 54.6% and 82.1% service rates on a Bloomington transit benchmark, outperforming plain RL and plain MCTS baselines.

TARMM: Scaling Delay-Critical Edge AI Offloading in 5G O-RAN via Temporal Graph Mobility Management

cs.NI · 2026-04-27 · unverdicted · novelty 5.0

TARMM uses a temporal graph to model RAN dynamics and MARL with action masking for proactive mobility management in 5G O-RAN, reducing tail latency by up to 44% and packet loss by up to 56% on a multi-cell testbed for VR workloads.

Learning Selective Merge Policies for Deadline-Constrained Coded Caching via Deep Reinforcement Learning

cs.IT · 2026-05-13

citing papers explorer

Showing 5 of 5 citing papers.

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency quant-ph · 2026-05-12 · unverdicted · none · ref 28
TuniQ uses RL with a dual-encoder, shaped rewards, and action masking to autotune quantum compilation passes, improving fidelity and speed over Qiskit while generalizing across backends and scaling to large circuits.
Your Loss is My Gain: Low Stake Attacks on Liquid Staking Pools cs.GT · 2026-05-01 · unverdicted · none · ref 45
A low-stake adversary can degrade a liquid staking pool's performance via consensus manipulation and profit from the resulting drop in its LST value through application-layer financial positions.
AlphaTransit: Learning to Design City-scale Transit Routes cs.AI · 2026-05-27 · unverdicted · none · ref 26
AlphaTransit pairs MCTS with a learned policy-value network to reach 54.6% and 82.1% service rates on a Bloomington transit benchmark, outperforming plain RL and plain MCTS baselines.
TARMM: Scaling Delay-Critical Edge AI Offloading in 5G O-RAN via Temporal Graph Mobility Management cs.NI · 2026-04-27 · unverdicted · none · ref 17
TARMM uses a temporal graph to model RAN dynamics and MARL with action masking for proactive mobility management in 5G O-RAN, reducing tail latency by up to 44% and packet loss by up to 56% on a multi-cell testbed for VR workloads.
Learning Selective Merge Policies for Deadline-Constrained Coded Caching via Deep Reinforcement Learning cs.IT · 2026-05-13 · unreviewed · ref 13

A closer look at invalid action masking in policy gradient algorithms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer