Step 4: Adaptive CoT Injection (L + Q + A→ Chain-of-Thought)We implement an adaptive mechanism to determine the necessity of explicit reasoning traces

Answer Derivation & Profiling ( Q→ A+L ): The models produce a structured standard solution simulating a standard textbook answer key, alongside a difficulty label (L) that serve

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

QuantumQA dataset and verification-aware RL with adaptive reward fusion enable an 8B LLM to achieve performance competitive with proprietary models on quantum mechanics tasks.

citing papers explorer

Showing 1 of 1 citing paper.

QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning cs.AI · 2026-04-20 · unverdicted · none · ref 12
QuantumQA dataset and verification-aware RL with adaptive reward fusion enable an 8B LLM to achieve performance competitive with proprietary models on quantum mechanics tasks.

Step 4: Adaptive CoT Injection (L + Q + A→ Chain-of-Thought)We implement an adaptive mechanism to determine the necessity of explicit reasoning traces

fields

years

verdicts

representative citing papers

citing papers explorer