RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.
and Wong, Catherine and Feng, Jiahai and Wei, Megan and Tenenbaum, Joshua B
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.
citing papers explorer
-
Consistency Training while Mitigating Obfuscation via Rate Matching
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.
-
RoboWits: Unexpected Challenges for Robotic Creative Problem Solving
RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.
- Verbalized Algorithms: Classical Algorithms are All You Need (Mostly)