A label-free self-supervised RL method derives rewards from instructions via constraint decomposition and binary classification, yielding improvements on in-domain and out-of-domain instruction-following tasks.
InFindings of the Association for Computational Linguistics: ACL 2025, pages 18632–18702
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following
A label-free self-supervised RL method derives rewards from instructions via constraint decomposition and binary classification, yielding improvements on in-domain and out-of-domain instruction-following tasks.