PE-MAMoE combines sparsely gated mixture-of-experts actors with a non-parametric phase controller in MAPPO to maintain plasticity under dynamic user mobility and traffic, yielding 26.3% higher normalized IQM return in simulations.
Understanding plasticity in neural networks
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4representative citing papers
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
Higher conservatism in offline DPO training of Qwen3-14B monotonically increases reward-hacking damage (Goodhart gap AUGC) during online adaptation on GSM8K.
The paper reframes agentic safety as an epistemic property defined by teachability—the capacity to preserve future corrective leverage—rather than a behavioral property of the current policy.
citing papers explorer
-
Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks
PE-MAMoE combines sparsely gated mixture-of-experts actors with a non-parametric phase controller in MAPPO to maintain plasticity under dynamic user mobility and traffic, yielding 26.3% higher normalized IQM return in simulations.
-
Rotation-Preserving Supervised Fine-Tuning
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
-
Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models
Higher conservatism in offline DPO training of Qwen3-14B monotonically increases reward-hacking damage (Goodhart gap AUGC) during online adaptation on GSM8K.
-
Agentic Safety is an Epistemic Property, Not a Behavioral One
The paper reframes agentic safety as an epistemic property defined by teachability—the capacity to preserve future corrective leverage—rather than a behavioral property of the current policy.