pith. sign in

super hub Mixed citations

Title resolution pending

Mixed citation behavior. Most common role is background (45%).

154 Pith papers citing it
Background 45% of classified citations
abstract

OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.

hub tools

citation-role summary

background 10 dataset 6 method 2 baseline 1 other 1

citation-polarity summary

claims ledger

  • abstract OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.

authors

co-cited works

clear filters

representative citing papers

What Type of Inference is Active Inference?

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

FedQHD achieves closed-form federated Q-learning via hyperdimensional encoders with linear readouts, formalizes the federation gap under heterogeneous encoders, and reports competitive performance on continuous-state benchmarks with reduced computation.

Proximal State Nudging: Reducing Skill Atrophy from AI Assistance

cs.RO · 2026-05-19 · unverdicted · novelty 7.0

Proximal State Nudging (PSN) jointly optimizes skill development and task performance in shared autonomy, outperforming baselines in LunarLander simulation and yielding up to 7x larger unassisted skill gains with 50% fewer collisions in human CARLA driving studies.

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

gym-invmgmt is a new benchmarking framework that evaluates inventory policies across optimization and learning methods, finding stochastic programming strongest among non-oracle approaches and PPO-Transformer best among learned ones in tested scenarios.

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents

cs.AI · 2026-05-02 · unverdicted · novelty 7.0

EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Deep reinforcement learning from human preferences stat.ML · 2017-06-12 · accept · none · ref 4 · internal anchor

    Reinforcement learning agents solve complex tasks without access to the reward function by training a reward predictor from human comparisons of trajectory segments, requiring feedback on less than 1% of interactions.

  • Distributional Off-Policy Evaluation with Deep Quantile Process Regression stat.ML · 2026-04-20 · unverdicted · none · ref 31 · internal anchor

    DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.

  • Nonparametric Sparse Online Learning of the Koopman Operator stat.ML · 2024-05-13 · unverdicted · none · ref 10 · internal anchor

    Develops a nonparametric sparse online algorithm to learn the Koopman operator iteratively via stochastic approximation with explicit complexity control and convergence guarantees in misspecified RKHS settings via conditional mean embeddings.

  • Towards A Rigorous Science of Interpretable Machine Learning stat.ML · 2017-02-28 · unverdicted · none · ref 6 · internal anchor

    The authors define interpretability for machine learning, specify when it is required, and propose a taxonomy for its rigorous evaluation while identifying open research questions.

  • Middle-mile logistics through the lens of goal-conditioned reinforcement learning stat.ML · 2026-05-04 · unverdicted · none · ref 7 · internal anchor

    Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.