pith. machine review for the scientific record. sign in

citation dossier

arXiv preprint arXiv:1312.3005 , year=

Chelba, C · 2014 · arXiv 1312.3005

20Pith papers citing it
21reference links
cs.CLtop field · 12 papers
UNVERDICTEDtop verdict bucket · 16 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 20 reviewed papers. Its strongest current cluster is cs.CL (12 papers). The largest review-status bucket among citing papers is UNVERDICTED (16 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Infinite Mask Diffusion for Few-Step Distillation

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.

Spherical Flows for Sampling Categorical Data

stat.ML · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Spherical vMF flows reduce the continuity equation on the sphere to a scalar ODE in cosine similarity, enabling posterior-weighted sampling of categorical sequences via cross-entropy trained posteriors.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Deep Learning Scaling is Predictable, Empirically

cs.LG · 2017-12-01 · unverdicted · novelty 7.0

Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

Pointer Sentinel Mixture Models

cs.CL · 2016-09-26 · conditional · novelty 7.0

Pointer sentinel-LSTM mixes context copying with softmax prediction to reach 70.9 perplexity on Penn Treebank using fewer parameters than standard LSTMs.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

citing papers explorer

Showing 20 of 20 citing papers.