super hub Mixed citations

Title resolution pending

Greg Brockman, Jie Tang, John Schulman, Jonas Schneider, Ludwig Pettersson, Vicki Cheung · 2016 · cs.LG · arXiv 1606.01540

Mixed citation behavior. Most common role is background (45%).

149 Pith papers citing it

Background 45% of classified citations

open full Pith review browse 149 citing papers more from Greg Brockman arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 dataset 6 method 2 baseline 1 other 1

citation-polarity summary

background 9 use dataset 5 unclear 3 use method 2 baseline 1

claims ledger

abstract OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.

authors

Greg Brockman Jie Tang John Schulman Jonas Schneider Ludwig Pettersson Vicki Cheung

co-cited works

representative citing papers

Provably Sub-Linear Two-Timescale NeuroEvolution with Online Plasticity

cs.NE · 2026-06-18 · unverdicted · novelty 8.0

NEOL decouples neuroevolution into outer architecture search and inner online weight adaptation, proving sublinear regret under mild conditions and showing empirical gains over pure NEAT on control benchmarks.

The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering

cs.SE · 2025-07-20 · conditional · novelty 8.0

AIDev is a new open dataset of 456k AI-agent pull requests showing agents submit code faster than humans but with lower acceptance rates and simpler changes.

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

cs.RO · 2024-03-14 · accept · novelty 8.0

BEHAVIOR-1K introduces a benchmark of 1,000 human everyday activities in realistic simulated scenes together with the OMNIGIBSON physics simulator to evaluate embodied AI.

Decision Transformer: Reinforcement Learning via Sequence Modeling

cs.LG · 2021-06-02 · accept · novelty 8.0

Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.

Low-power analogue neural networks with trainable nonlinear connections for continuous control

cs.LG · 2026-06-21 · unverdicted · novelty 7.0

Placing trainable nonlinear functions on connections in analogue networks enables efficient representation of smooth continuous targets with hardware transfer at projected 30 microwatt power.

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

ENPIRE supplies four modules (Environment, Policy Improvement, Rollout, Evolution) that turn real-world robot training into an autonomous optimization loop driven by coding agents.

Expected Free Energy-based Planning as Variational Inference

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.

What Type of Inference is Active Inference?

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.

Task-Induced Representational Invariances Depend on Learning Objective in Deep RL

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

In navigation tasks, DQN learns MDP-homomorphism-invariant representations while PPO learns action-symmetric ones despite comparable performance, with effects on transfer and in LLMs.

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

FedQHD achieves closed-form federated Q-learning via hyperdimensional encoders with linear readouts, formalizes the federation gap under heterogeneous encoders, and reports competitive performance on continuous-state benchmarks with reduced computation.

Proximal State Nudging: Reducing Skill Atrophy from AI Assistance

cs.RO · 2026-05-19 · unverdicted · novelty 7.0

Proximal State Nudging (PSN) jointly optimizes skill development and task performance in shared autonomy, outperforming baselines in LunarLander simulation and yielding up to 7x larger unassisted skill gains with 50% fewer collisions in human CARLA driving studies.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework

cs.NE · 2026-05-14 · unverdicted · novelty 7.0

A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.

IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

gym-invmgmt is a new benchmarking framework that evaluates inventory policies across optimization and learning methods, finding stochastic programming strongest among non-oracle approaches and PPO-Transformer best among learned ones in tested scenarios.

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.

FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents

cs.AI · 2026-05-02 · unverdicted · novelty 7.0

EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.

Hierarchical Active Inference using Successor Representations

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

A hierarchical active inference framework using successor representations learns abstract states and actions to enable efficient planning on navigation and reinforcement learning tasks.

citing papers explorer

Showing 16 of 16 citing papers after filters.

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World cs.AI · 2026-06-18 · unverdicted · none · ref 7 · internal anchor
ENPIRE supplies four modules (Environment, Policy Improvement, Rollout, Evolution) that turn real-world robot training into an autonomous optimization loop driven by coding agents.
Expected Free Energy-based Planning as Variational Inference cs.AI · 2026-06-09 · unverdicted · none · ref 42 · internal anchor
EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
What Type of Inference is Active Inference? cs.AI · 2026-06-03 · unverdicted · none · ref 46 · internal anchor
EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.
EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents cs.AI · 2026-05-02 · unverdicted · none · ref 20 · internal anchor
EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks cs.AI · 2026-04-22 · unverdicted · none · ref 2 · internal anchor
COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.
A Generalist Agent cs.AI · 2022-05-12 · accept · none · ref 12 · internal anchor
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning cs.AI · 2026-06-10 · unverdicted · none · ref 15 · internal anchor
SVoT uses RL with GRPO to train MLLMs on interleaved textual and visual reasoning chains for multi-hop spatial tasks, achieving up to 65% accuracy gains on new domains with quantitative state verification.
BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 6 · internal anchor
BehaviorGuard detects backdoor behaviors in DRL policies via behavioral drift in action distributions and suppresses suspicious actions at runtime, claimed as the first online defense for both single- and multi-agent settings.
Policy-Invisible Violations in LLM-Based Agents cs.AI · 2026-04-14 · unverdicted · none · ref 4 · internal anchor
LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents cs.AI · 2026-03-01 · unverdicted · none · ref 9 · internal anchor
HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.
Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling cs.AI · 2025-08-06 · unverdicted · none · ref 1 · internal anchor
Presents MDS framework, linear-dynamics construction method, and tunable synthetic POMDP suite for controlled testing of memory-augmented reinforcement learning.
General Board Game Playing for Education and Research in Generic AI Game Learning cs.AI · 2019-07-11 · unverdicted · none · ref 16 · internal anchor
GBG framework standardizes board game AI interfaces and shows a generic TD(λ)-n-tuple agent outperforming MCTS on multiple games for education and research.
Certificate-Guided Evaluation of Reinforcement Learning Generalization cs.AI · 2026-05-30 · unverdicted · none · ref 5 · internal anchor
A logic-driven framework defines inductive reach-avoid tasks and uses neural certificates to certify RL generalization, with empirical results linking fewer violations to more solved test tasks.
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments cs.AI · 2026-03-25 · unverdicted · none · ref 36 · internal anchor
An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.
A Machine With Human-Like Memory Systems cs.AI · 2022-04-04 · unverdicted · none · ref 4 · internal anchor
An agent with both semantic and episodic memory outperforms single-memory agents in the custom 'Room' environment, with added gains from multi-agent or human-AI collaboration.
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions cs.AI · 2024-08-23 · unverdicted · none · ref 79 · internal anchor
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer