Conservative safety critics for exploration

· 2021 · arXiv 2010.14497

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

An Agency-Transferring Model-Free Policy Enhancement Technique

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

A model-free RL method arbitrates between a functional baseline policy and a learning policy, transferring agency over time to yield a standalone policy with high goal-reaching rates and competitive returns on continuous-control tasks.

Safe-Support Q-Learning: Learning without Unsafe Exploration

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Safe-Support Q-Learning trains Q-functions and policies in reinforcement learning without ever visiting unsafe states by constraining the behavior policy to a safe set and using KL-regularized Bellman targets in a two-stage framework.

Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts

cs.CV · 2026-05-21 · unverdicted · novelty 4.0

Pre-VLA is a multimodal runtime verifier that predicts safety confidence and advantage scores for action chunks, raising closed-loop success rates on the LIBERO benchmark from 30.79% to 37.62%.

citing papers explorer

Showing 3 of 3 citing papers.

An Agency-Transferring Model-Free Policy Enhancement Technique cs.LG · 2026-06-08 · unverdicted · none · ref 28
A model-free RL method arbitrates between a functional baseline policy and a learning policy, transferring agency over time to yield a standalone policy with high goal-reaching rates and competitive returns on continuous-control tasks.
Safe-Support Q-Learning: Learning without Unsafe Exploration cs.LG · 2026-04-28 · unverdicted · none · ref 1
Safe-Support Q-Learning trains Q-functions and policies in reinforcement learning without ever visiting unsafe states by constraining the behavior policy to a safe set and using KL-regularized Bellman targets in a two-stage framework.
Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts cs.CV · 2026-05-21 · unverdicted · none · ref 35
Pre-VLA is a multimodal runtime verifier that predicts safety confidence and advantage scores for action chunks, raising closed-loop success rates on the LIBERO benchmark from 30.79% to 37.62%.

Conservative safety critics for exploration

fields

years

verdicts

representative citing papers

citing papers explorer