LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Post-selection with DL or FBF after multi-objective GP search improves test-set performance over AIC/BIC baselines on noisy synthetic and real regression tasks, while using DL directly as fitness often causes premature convergence to overly simple models.
citing papers explorer
-
Discovering Reinforcement Learning Interfaces with Large Language Models
LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
-
Guiding Multi-Objective Genetic Programming with Description Length Improves Symbolic Regression Solutions
Post-selection with DL or FBF after multi-objective GP search improves test-set performance over AIC/BIC baselines on noisy synthetic and real regression tasks, while using DL directly as fitness often causes premature convergence to overly simple models.