W., Owen, S., and Frankle, J

Blakeney, C · 2024 · arXiv 2406.03476

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

cs.LG · 2024-08-06 · unverdicted · novelty 6.0

An adaptive compute-optimal strategy for scaling LLM test-time compute achieves over 4x efficiency gains versus best-of-N and lets smaller models outperform 14x larger ones on some problems.

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

cs.CL · 2025-02-04 · unverdicted · novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

XekRung Technical Report

cs.CR · 2026-04-30 · unverdicted · novelty 3.0

XekRung achieves state-of-the-art performance on cybersecurity benchmarks among same-scale models via tailored data synthesis and multi-stage training while retaining strong general capabilities.

citing papers explorer

Showing 4 of 4 citing papers.

ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 17
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters cs.LG · 2024-08-06 · unverdicted · none · ref 5
An adaptive compute-optimal strategy for scaling LLM test-time compute achieves over 4x efficiency gains versus best-of-N and lets smaller models outperform 14x larger ones on some problems.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025-02-04 · unverdicted · none · ref 153
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
XekRung Technical Report cs.CR · 2026-04-30 · unverdicted · none · ref 131
XekRung achieves state-of-the-art performance on cybersecurity benchmarks among same-scale models via tailored data synthesis and multi-stage training while retaining strong general capabilities.

W., Owen, S., and Frankle, J

fields

years

verdicts

representative citing papers

citing papers explorer