A rate-distortion based switching strategy for adaptive state-action abstractions in RL decomposes value error into Bellman residual and bisimulation metric terms to achieve near-optimal performance under lossy compression in tabular settings.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.
citing papers explorer
-
Adaptive state-action abstractions via rate-distortion
A rate-distortion based switching strategy for adaptive state-action abstractions in RL decomposes value error into Bellman residual and bisimulation metric terms to achieve near-optimal performance under lossy compression in tabular settings.
-
Consistency Training while Mitigating Obfuscation via Rate Matching
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.