The only intended difference is the per-token loss weighting rule

14 A CPT Corpus Construction Details All reported standard-CPT andEXACTcomparisons are constructed as paired runs: within a backbone, CPT stage, the compared models share the same packed stream, document-boundary mask, train/validation s · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

cs.CL · 2026-05-11 · conditional · novelty 6.0

EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.

citing papers explorer

Showing 1 of 1 citing paper.

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing cs.CL · 2026-05-11 · conditional · none · ref 31
EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.

The only intended difference is the per-token loss weighting rule

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer