Megascience: Pushing the frontiers of post-training datasetsforsciencereasoning.arXivpreprint

Run-Ze Fan, Zengzhi Wang, Pengfei Liu · 2025 · arXiv 2507.16812

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 background 1

representative citing papers

BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language

cs.CL · 2026-06-20 · unverdicted · novelty 7.0

BioMatrix unifies sequences, structures, and language for molecules and proteins inside one decoder-only foundation model via shared discrete tokens and achieves SOTA or competitive results on 77 of 80 downstream tasks.

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

cs.AI · 2026-04-12 · unverdicted · novelty 7.0

A multi-agent framework reconstructs the evolutionary graph of post-training LLM datasets, revealing domain patterns like vertical refinement in math data and systemic issues like redundancy and benchmark contamination, then applies it to create a more diverse lineage-aware dataset.

How Post-Training Shapes Biological Reasoning Models

cs.LG · 2026-06-15 · unverdicted · novelty 6.0

Post-training stages reshape generalization in biological reasoning models distinctly: CPT aligns with biological language, SFT boosts ID performance but causes OOD to peak early and decline, while RL on strong SFT checkpoints can recover OOD generalization.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

CrystalReasoner combines LLM reasoning traces with physical priors and multi-objective RL to generate valid, stable, and property-conditioned crystal structures.

Reward Hacking in Rubric-Based Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

SOD: Step-wise On-policy Distillation for Small Language Model Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

cs.CL · 2025-11-14 · unverdicted · novelty 6.0

MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

cs.LG · 2026-06-05 · unverdicted · novelty 5.0

SlimSearcher reduces tool-call rounds by 17-58% on GAIA, BrowseComp and XBenchDeepSearch while maintaining accuracy via Pareto filtration in SFT and Adaptive Reward Gating in RL.

Enhancing Fitness Intelligence through Domain-Specific LLM Post-Training

cs.AI · 2026-07-02 · unverdicted · novelty 3.0

FitOne-8B/32B models improve average scores on ACSM-EP and NSCA-CSCS certification exams by up to 12.73% over base Qwen3 while retaining general capabilities.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs cs.AI · 2026-04-12 · unverdicted · none · ref 13
A multi-agent framework reconstructs the evolutionary graph of post-training LLM datasets, revealing domain patterns like vertical refinement in math data and systemic issues like redundancy and benchmark contamination, then applies it to create a more diverse lineage-aware dataset.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 22
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation cs.AI · 2026-05-14 · unverdicted · none · ref 2
CrystalReasoner combines LLM reasoning traces with physical priors and multi-objective RL to generate valid, stable, and property-conditioned crystal structures.
Reward Hacking in Rubric-Based Reinforcement Learning cs.AI · 2026-05-12 · unverdicted · none · ref 8
Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.
Enhancing Fitness Intelligence through Domain-Specific LLM Post-Training cs.AI · 2026-07-02 · unverdicted · none · ref 30
FitOne-8B/32B models improve average scores on ACSM-EP and NSCA-CSCS certification exams by up to 12.73% over base Qwen3 while retaining general capabilities.

Megascience: Pushing the frontiers of post-training datasetsforsciencereasoning.arXivpreprint

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer