Pencil: Long thoughts with short memory

Chenxiao Yang, Nathan Srebro, David McAllester, Zhiyuan Li · 2025 · arXiv 2503.14337

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

cs.LG · 2026-06-08 · unverdicted · novelty 8.0

Depth-L transformers with W parameters have VC dimension Theta(L W log(T W)), yielding matching O(L W log((T+T')W)) upper and Omega(L W log((T+T')W/L)) lower bounds on sample complexity for chain-of-thought learning.

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

cs.LG · 2026-06-01 · unverdicted · novelty 8.0

Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.

Pseudo-Formalization for Automatic Proof Verification

cs.LO · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

Pseudo-Formalization decomposes proofs into self-contained natural language modules for independent LLM-based Block Verification, outperforming LLM-as-judge baselines on olympiad and research math benchmarks while releasing ArxivMathGradingBench.

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

CES applies conditional bidirectional entropy control on top of DAPO to improve accuracy and shorten responses on mathematical benchmarks for 7B and 1.5B LLMs.

Stateful Reasoning via Insight Replay

cs.AI · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

InsightReplay improves long CoT reasoning by extracting critical insights from the trace and replaying them near the active frontier, delivering +1.65 average accuracy gain across 24 model-benchmark settings.

Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

A contrastive visual forgetting technique constrained to the null space of retained knowledge enables targeted unlearning of visual concepts in MLLMs while preserving non-target visual and all textual knowledge.

MEMENTO: Teaching LLMs to Manage Their Own Context

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.

Modularized Reinforcement Learning on LLMs: From MDP Creation to Exploration and Learning

cs.LG · 2026-06-20 · unverdicted · novelty 5.0

Survey mapping RL techniques onto LLM training and highlighting gaps in value-based, off-policy, and bootstrapping methods.

citing papers explorer

Showing 8 of 8 citing papers after filters.

Tight Sample Complexity of Transformers cs.LG · 2026-06-08 · unverdicted · none · ref 53
Depth-L transformers with W parameters have VC dimension Theta(L W log(T W)), yielding matching O(L W log((T+T')W)) upper and Omega(L W log((T+T')W/L)) lower bounds on sample complexity for chain-of-thought learning.
Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete cs.LG · 2026-06-01 · unverdicted · none · ref 45
Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.
Pseudo-Formalization for Automatic Proof Verification cs.LO · 2026-05-19 · unverdicted · none · ref 33 · 2 links
Pseudo-Formalization decomposes proofs into self-contained natural language modules for independent LLM-based Block Verification, outperforming LLM-as-judge baselines on olympiad and research math benchmarks while releasing ArxivMathGradingBench.
Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning cs.CL · 2026-05-19 · unverdicted · none · ref 7
CES applies conditional bidirectional entropy control on top of DAPO to improve accuracy and shorten responses on mathematical benchmarks for 7B and 1.5B LLMs.
Stateful Reasoning via Insight Replay cs.AI · 2026-05-14 · unverdicted · none · ref 15 · 2 links
InsightReplay improves long CoT reasoning by extracting critical insights from the trace and replaying them near the active frontier, delivering +1.65 average accuracy gain across 24 model-benchmark settings.
Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning cs.AI · 2026-05-07 · unverdicted · none · ref 3
A contrastive visual forgetting technique constrained to the null space of retained knowledge enables targeted unlearning of visual concepts in MLLMs while preserving non-target visual and all textual knowledge.
MEMENTO: Teaching LLMs to Manage Their Own Context cs.AI · 2026-04-10 · unverdicted · none · ref 33
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
Modularized Reinforcement Learning on LLMs: From MDP Creation to Exploration and Learning cs.LG · 2026-06-20 · unverdicted · none · ref 242
Survey mapping RL techniques onto LLM training and highlighting gaps in value-based, off-policy, and bootstrapping methods.

Pencil: Long thoughts with short memory

fields

years

verdicts

representative citing papers

citing papers explorer