pith. sign in

arxiv: 2605.29076 · v1 · pith:GE64MTT3new · submitted 2026-05-27 · 💻 cs.CL · cs.AI· cs.LG

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

classification 💻 cs.CL cs.AIcs.LG
keywords reasoningtextlearningoptimizationpromptclassificationcompactcomplex
0
0 comments X
read the original abstract

LLMs have advanced text classification, yet existing paradigms face a trade-off: supervised (label only) fine-tuning is scalable but offers limited reasoning on complex text and lacks broader model transparency, while discrete prompt optimization offers human-readable instructions but struggles with performance and scalability. We introduce eXTC (eXplainable Text Classifier) with three progressive stages: (1) learning a Standard Operating Procedure (SOP, or rulebook) in natural language via a new Structured Prompt Optimization algorithm; (2) SOP-grounded reasoning distillation from a large teacher LLM into a compact LM; and (3) expanding reasoning capabilities beyond the initial SOP via reinforcement learning. This design enables eXTC to provide (i) fast inference via a compact LM, with (ii) inference-time local reasoning traces, alongside a global, modular explanation of its learned domain rules, while (iii) significantly outperforming existing paradigms across diverse benchmarks in both classification performance and explanation quality, with stage-by-stage gains.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.