ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
Eigen-1: Adaptive multi-agent refinement with monitor-based rag for scientific reasoning.arXiv preprint arXiv:2509.21193, 2025
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
background 1polarities
background 1representative citing papers
MementoGUI introduces a modular memory-control framework with working and episodic memory operators that improves long-horizon GUI agent performance over history-replay and text-only baselines.
AceGRPO trains 30B-parameter LLM agents to achieve 100% valid submissions and competitive performance on MLE-Bench-Lite through evolving data buffers and adaptive task sampling.
citing papers explorer
-
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
-
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents
MementoGUI introduces a modular memory-control framework with working and episodic memory operators that improves long-horizon GUI agent performance over history-replay and text-only baselines.
-
AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
AceGRPO trains 30B-parameter LLM agents to achieve 100% valid submissions and competitive performance on MLE-Bench-Lite through evolving data buffers and adaptive task sampling.
- SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning