U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A label-free self-supervised RL method derives rewards from instructions via constraint decomposition and binary classification, yielding improvements on in-domain and out-of-domain instruction-following tasks.
citing papers explorer
-
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
-
Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following
A label-free self-supervised RL method derives rewards from instructions via constraint decomposition and binary classification, yielding improvements on in-domain and out-of-domain instruction-following tasks.