pith. machine review for the scientific record. sign in

hub

Dr tulu: Reinforcement learning with evolving rubrics for deep research

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

hub tools

years

2026 13

representative citing papers

Rubric-based On-policy Distillation

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Rubric-based on-policy distillation allows training student models using only teacher responses by generating scoring rubrics from contrasts and using them for on-policy optimization, achieving superior performance and up to 10x better sample efficiency than logit-based approaches.

Reward Hacking in Rubric-Based Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

Olmo Hybrid: From Theory to Practice and Back

cs.LG · 2026-04-03 · conditional · novelty 6.0

A 7B hybrid attention-recurrent model outperforms its pure-transformer counterpart on pretraining metrics and scales more efficiently, supported by a proof that hybrids are strictly more expressive than either transformers or linear RNNs.

citing papers explorer

Showing 13 of 13 citing papers.