SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
hub
Eureka: Human-Level Reward Design via Coding Large Language Models
20 Pith papers cite this work. Polarity classification is still indexing.
abstract
Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time, a simulated Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at rapid speed.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.
CoRAL lets LLMs act as adaptive cost designers for motion planners while using VLM priors and online identification to handle unknown physics, achieving over 50% higher success rates than baselines in unseen contact-rich robotic scenarios.
Iterative search over reward functions with ranked feedback in GRPO training improves LLM math reasoning, achieving F1 of 0.795 on GSM8K versus 0.609 for baseline.
A2DEPT generates complete algorithms for COPs using LLM-driven evolutionary program trees with hybrid selection and repair, reducing mean normalized optimality gap by 9.8% versus strongest AHD baselines on standard benchmarks.
AutoSOTA uses eight specialized agents to replicate and optimize models from recent AI papers, producing 105 new SOTA results in about five hours per paper on average.
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior automated rewards.
SAGE trains agents in physics-grounded semantic abstractions via RL with asymmetric clipping, achieving 53.21% LLM-Match Success on A-EQA (+9.7% over baseline) and encouraging physical robot transfer.
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.
A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.
Test-time steering of pre-trained whole-body policies via sample-based planning lets legged robots generalize dynamic loco-manipulation to varied heavy objects and tasks without additional training or tuning.
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.
CVEvolve uses LLM agents with lineage-aware search to autonomously discover algorithms that outperform baselines on scientific image tasks including registration, peak detection, and segmentation.
AgenticRecTune deploys five LLM agents (Actor, Critic, Insight, Skill, Online) and a self-evolving Skillhub to handle end-to-end configuration optimization for multi-stage recommendation systems.
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
citing papers explorer
-
Do LLM-derived graph priors improve multi-agent coordination?
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.