pith. machine review for the scientific record. sign in

Effective long-context scaling of foundation models

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

clear filters

representative citing papers

PolicyLong: Towards On-Policy Context Extension

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

PolicyLong shifts long-context data synthesis to an on-policy loop that re-screens contexts using the evolving model's entropy landscape, producing a self-curriculum that outperforms static offline baselines with larger gains at longer lengths.

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

cs.CV · 2024-08-12 · unverdicted · novelty 6.0

CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

cs.DC · 2026-04-18 · unverdicted · novelty 5.0

HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and

Qwen3 Technical Report

cs.CL · 2025-05-14 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Yi: Open Foundation Models by 01.AI

cs.CL · 2024-03-07 · unverdicted · novelty 4.0

Yi models are 6B and 34B open foundation models pretrained on 3.1T curated tokens that achieve strong benchmark results through data quality and targeted extensions like long context and vision alignment.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer cs.CV · 2024-08-12 · unverdicted · none · ref 11

    CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.