citation dossier

Edouard Grave, Armand Joulin, and Nicolas Usunier

stub below hub threshold · 6 Pith inbound

URL http://arxiv · 2017 · arXiv 1705.03122

6Pith papers citing it

6reference links

cs.CLtop field · 3 papers

UNVERDICTEDtop verdict bucket · 6 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 6 reviewed papers. Its strongest current cluster is cs.CL (3 papers). The largest review-status bucket among citing papers is UNVERDICTED (6 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Generating Long Sequences with Sparse Transformers

cs.LG · 2019-04-23 · unverdicted · novelty 7.0

Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

K-STEMIT reduces RMSE by 21% for subsurface stratigraphy thickness estimation from radar data via a knowledge-informed spatio-temporal GNN with adaptive feature fusion and physical priors from the MAR weather model.

YaRN: Efficient Context Window Extension of Large Language Models

cs.CL · 2023-08-31 · unverdicted · novelty 6.0

YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model

astro-ph.SR · 2026-04-23 · unverdicted · novelty 5.0

A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 9
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Edouard Grave, Armand Joulin, and Nicolas Usunier

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer