pith. sign in

Megascience: Pushing the frontiers of post-training datasetsforsciencereasoning.arXivpreprint

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

dataset 2 background 1

citation-polarity summary

years

2026 8 2025 2

clear filters

representative citing papers

How Post-Training Shapes Biological Reasoning Models

cs.LG · 2026-06-15 · unverdicted · novelty 6.0

Post-training stages reshape generalization in biological reasoning models distinctly: CPT aligns with biological language, SFT boosts ID performance but causes OOD to peak early and decline, while RL on strong SFT checkpoints can recover OOD generalization.

Reward Hacking in Rubric-Based Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

citing papers explorer

Showing 5 of 5 citing papers after filters.