NEOL decouples neuroevolution into outer architecture search and inner online weight adaptation, proving sublinear regret under mild conditions and showing empirical gains over pure NEAT on control benchmarks.
super hub Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is background (45%).
abstract
OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
authors
co-cited works
representative citing papers
AIDev is a new open dataset of 456k AI-agent pull requests showing agents submit code faster than humans but with lower acceptance rates and simpler changes.
BEHAVIOR-1K introduces a benchmark of 1,000 human everyday activities in realistic simulated scenes together with the OMNIGIBSON physics simulator to evaluate embodied AI.
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
Placing trainable nonlinear functions on connections in analogue networks enables efficient representation of smooth continuous targets with hardware transfer at projected 30 microwatt power.
ENPIRE supplies four modules (Environment, Policy Improvement, Rollout, Evolution) that turn real-world robot training into an autonomous optimization loop driven by coding agents.
EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.
In navigation tasks, DQN learns MDP-homomorphism-invariant representations while PPO learns action-symmetric ones despite comparable performance, with effects on transfer and in LLMs.
FedQHD achieves closed-form federated Q-learning via hyperdimensional encoders with linear readouts, formalizes the federation gap under heterogeneous encoders, and reports competitive performance on continuous-state benchmarks with reduced computation.
Proximal State Nudging (PSN) jointly optimizes skill development and task performance in shared autonomy, outperforming baselines in LunarLander simulation and yielding up to 7x larger unassisted skill gains with 50% fewer collisions in human CARLA driving studies.
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.
gym-invmgmt is a new benchmarking framework that evaluates inventory policies across optimization and learning methods, finding stochastic programming strongest among non-oracle approaches and PPO-Transformer best among learned ones in tested scenarios.
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.
COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.
A hierarchical active inference framework using successor representations learns abstract states and actions to enable efficient planning on navigation and reinforcement learning tasks.
citing papers explorer
-
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
ENPIRE supplies four modules (Environment, Policy Improvement, Rollout, Evolution) that turn real-world robot training into an autonomous optimization loop driven by coding agents.
-
Expected Free Energy-based Planning as Variational Inference
EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
-
What Type of Inference is Active Inference?
EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.
-
EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents
EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.
-
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.
-
A Generalist Agent
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
-
SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning
SVoT uses RL with GRPO to train MLLMs on interleaved textual and visual reasoning chains for multi-hop spatial tasks, achieving up to 65% accuracy gains on new domains with quantitative state verification.
-
BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning
BehaviorGuard detects backdoor behaviors in DRL policies via behavioral drift in action distributions and suppresses suspicious actions at runtime, claimed as the first online defense for both single- and multi-agent settings.
-
Policy-Invisible Violations in LLM-Based Agents
LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.
-
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents
HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.
-
Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling
Presents MDS framework, linear-dynamics construction method, and tunable synthetic POMDP suite for controlled testing of memory-augmented reinforcement learning.
-
General Board Game Playing for Education and Research in Generic AI Game Learning
GBG framework standardizes board game AI interfaces and shows a generic TD(λ)-n-tuple agent outperforming MCTS on multiple games for education and research.
-
Certificate-Guided Evaluation of Reinforcement Learning Generalization
A logic-driven framework defines inductive reach-avoid tasks and uses neural certificates to certify RL generalization, with empirical results linking fewer violations to more solved test tasks.
-
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments
An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.
-
A Machine With Human-Like Memory Systems
An agent with both semantic and episodic memory outperforms single-memory agents in the custom 'Room' environment, with added gains from multi-agent or human-AI collaboration.
-
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.