LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
A comparative study on reasoning patterns of openai’s o1 model
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
Rulemapping uses expert symbolic scaffolds to constrain LLMs, raising precision on §130(1) German hate-speech classification from 0.34-0.49 to 0.80-0.86 while preserving recall of 0.82-0.89.
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
STILL-2 uses imitation of distilled long-form thoughts, multi-rollout exploration on difficult problems, and iterative self-improvement of the dataset to train reasoning models that reach competitive performance on three challenging benchmarks.
citing papers explorer
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
-
From System 1 to System 2: A Survey of Reasoning Large Language Models
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
-
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems
STILL-2 uses imitation of distilled long-form thoughts, multi-rollout exploration on difficult problems, and iterative self-improvement of the dataset to train reasoning models that reach competitive performance on three challenging benchmarks.