pith. sign in

Title resolution pending

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 4

verdicts

UNVERDICTED 4

representative citing papers

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

Scaling Laws for Mixture Pretraining Under Data Constraints

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

citing papers explorer

Showing 4 of 4 citing papers.

  • Forecasting Downstream Performance of LLMs With Proxy Metrics cs.CL · 2026-05-18 · unverdicted · none · ref 88

    Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

  • Scaling Laws for Mixture Pretraining Under Data Constraints cs.LG · 2026-05-12 · unverdicted · none · ref 24

    Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.

  • ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 215

    ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

  • Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning cs.CL · 2026-04-06 · unverdicted · none · ref 5 · 2 links

    ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.