Inference-time computations for LLM reasoning and planning: A benchmark and insights

LongLLMLingua: Accelerating, enhancing LLMs in long context scenarios via prompt compression · 2023 · arXiv 2502.12521

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration

cs.AI · 2026-06-07 · unverdicted · novelty 6.0

Co-pi-tree distills LLM reasoning into a dual policy tree refined via interaction feedback, reporting 35.4% higher rewards, 77.7% fewer LLM queries, and 97.1% lower latency than baselines in Overcooked-AI.

Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

cs.SE · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

Reinforcement learning on MIR features combined with cargo-fuzz validation reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59.0% and accuracy to 65.2%.

The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

cs.AI · 2026-04-18 · unverdicted · novelty 5.0

System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning cs.AI · 2026-05-27 · unverdicted · none · ref 36
MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation cs.CL · 2026-05-08 · unverdicted · none · ref 38
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration cs.AI · 2026-06-07 · unverdicted · none · ref 1
Co-pi-tree distills LLM reasoning into a dual policy tree refined via interaction feedback, reporting 35.4% higher rewards, 77.7% fewer LLM queries, and 97.1% lower latency than baselines in Overcooked-AI.
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning cs.SE · 2026-05-05 · unverdicted · none · ref 42 · 2 links
Reinforcement learning on MIR features combined with cargo-fuzz validation reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59.0% and accuracy to 65.2%.
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus cs.AI · 2026-04-18 · unverdicted · none · ref 24
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.

Inference-time computations for LLM reasoning and planning: A benchmark and insights

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer