For the reinforcement learning (PPO) phase, we utilize the TRL library (Havrilla et al., 2023) with standard hyperparameter configu- rations unless otherwise specified

(v4 · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

QuantumQA dataset and verification-aware RL with adaptive reward fusion enable an 8B LLM to achieve performance competitive with proprietary models on quantum mechanics tasks.

citing papers explorer

Showing 1 of 1 citing paper.

QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning cs.AI · 2026-04-20 · unverdicted · none · ref 10
QuantumQA dataset and verification-aware RL with adaptive reward fusion enable an 8B LLM to achieve performance competitive with proprietary models on quantum mechanics tasks.

For the reinforcement learning (PPO) phase, we utilize the TRL library (Havrilla et al., 2023) with standard hyperparameter configu- rations unless otherwise specified

fields

years

verdicts

representative citing papers

citing papers explorer