arXiv preprint arXiv:2503.01333 , year =

Xu Liang , title = · arXiv 2503.01333

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

CHASE: Adversarial Red-Blue Teaming for Improving LLM Safety using Reinforcement Learning

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

CHASE uses co-evolutionary RL with GRPO to harden LLMs against black-box prompt-rewriting attacks, cutting mean StrongREJECT scores by 43.2% on held-out families while keeping zero false refusals on benign prompts.

BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards

cs.AI · 2026-06-27 · unverdicted · novelty 4.0

BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

CHASE: Adversarial Red-Blue Teaming for Improving LLM Safety using Reinforcement Learning cs.CL · 2026-06-04 · unverdicted · none · ref 16
CHASE uses co-evolutionary RL with GRPO to harden LLMs against black-box prompt-rewriting attacks, cutting mean StrongREJECT scores by 43.2% on held-out families while keeping zero false refusals on benign prompts.

arXiv preprint arXiv:2503.01333 , year =

fields

years

verdicts

representative citing papers

citing papers explorer