Common gating signals for adaptive LLM compute have unstable directions across settings, and DIAL learns per-setting utility directions from signal-agnostic counterfactuals to outperform fixed-direction baselines.
Tree of thoughts: Deliberate problem solving with large language models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
STILL-2 uses imitation of distilled long-form thoughts, multi-rollout exploration on difficult problems, and iterative self-improvement of the dataset to train reasoning models that reach competitive performance on three challenging benchmarks.
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
citing papers explorer
-
Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents
Common gating signals for adaptive LLM compute have unstable directions across settings, and DIAL learns per-setting utility directions from signal-agnostic counterfactuals to outperform fixed-direction baselines.
-
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems
STILL-2 uses imitation of distilled long-form thoughts, multi-rollout exploration on difficult problems, and iterative self-improvement of the dataset to train reasoning models that reach competitive performance on three challenging benchmarks.
-
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.