Advances in Neural Information Processing Systems , year=

Reflexion: Language agents with verbal reinforcement learning , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching

cs.LG · 2026-05-09 · conditional · novelty 7.0

Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.

Semantic Voting: Execution-Grounded Consensus for LLM Code Generation

cs.SE · 2026-05-09 · unverdicted · novelty 6.0

Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

StraTA improves LLM agent success rates to 93.1% on ALFWorld and 84.2% on WebShop by sampling a compact initial strategy and training it jointly with action execution via hierarchical GRPO-style rollouts.

citing papers explorer

Showing 3 of 3 citing papers.

Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching cs.LG · 2026-05-09 · conditional · none · ref 17
Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation cs.SE · 2026-05-09 · unverdicted · none · ref 23
Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction cs.CL · 2026-05-07 · unverdicted · none · ref 6
StraTA improves LLM agent success rates to 93.1% on ALFWorld and 84.2% on WebShop by sampling a compact initial strategy and training it jointly with action execution via hierarchical GRPO-style rollouts.

Advances in Neural Information Processing Systems , year=

fields

years

verdicts

representative citing papers

citing papers explorer