pith. sign in

arxiv: 2512.02240 · v2 · pith:GTM6CRTZnew · submitted 2025-12-01 · 💻 cs.CL

Lightweight Latent Reasoning for Narrative Tasks

classification 💻 cs.CL
keywords reasoninglatentlitereasonmodeltaskscomesgenerationlightweight
0
0 comments X
read the original abstract

Large language models (LLMs) tackle complex tasks by generating long chains of thought or "reasoning traces" that act as latent variables in the generation of an output given a query. A model's ability to generate such traces can be optimized with reinforcement learning (RL) to improve their utility in predicting an answer. This optimization comes at a high computational cost, especially for narrative-related tasks that involve retrieving and processing many tokens. To this end, we propose LiteReason, a latent reasoning method that can be interleaved with standard token sampling and easily combined with RL techniques. LiteReason employs a lightweight Reasoning Projector module, trained to produce continuous latent tokens that help the model 'skip' reasoning steps. During RL, the policy model decides when to activate the projector, switching between latent and discrete reasoning as needed. Experimental results on plot hole detection and book chapter generation show that our method outperforms latent reasoning baselines and comes close to matching non-latent RL training, while reducing final reasoning length by 77-92%. Overall, LiteReason guides RL training to a more efficient part of the performance-computation tradeoff curve.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers

    cs.LG 2026-06 unverdicted novelty 6.0

    LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.

  2. TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization

    cs.CL 2026-06 unverdicted novelty 5.0

    TARPO is a pure RL framework using a token-wise action router to switch between discrete token generation and latent reasoning in LLMs, with joint optimization showing outperformance on benchmarks.