SkillComposer performs task-conditioned skill sequence prediction with a constrained autoregressive decoder to jointly output skill subset, count, and order, raising pass rates by 23.1 and 18.2 percentage points on two production coding agents over no-skill baselines.
Toolchain*: Efficient action space navigation in large language models with a* search
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
FitText embeds evolutionary retrieval of tool descriptions into the agent loop, yielding 2.7-10.6 point NDCG@5 gains on ToolRet and 26.7-point pass-rate gains on StableToolBench.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
citing papers explorer
-
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.