Edouard Grave, Armand Joulin, and Nicolas Usunier

URL http://arxiv · 2017 · arXiv 1705.03122

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Generating Long Sequences with Sparse Transformers

cs.LG · 2019-04-23 · unverdicted · novelty 7.0

Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

K-STEMIT reduces RMSE by 21% for subsurface stratigraphy thickness estimation from radar data via a knowledge-informed spatio-temporal GNN with adaptive feature fusion and physical priors from the MAR weather model.

YaRN: Efficient Context Window Extension of Large Language Models

cs.CL · 2023-08-31 · unverdicted · novelty 6.0

YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model

astro-ph.SR · 2026-04-23 · unverdicted · novelty 5.0

A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Generating Long Sequences with Sparse Transformers cs.LG · 2019-04-23 · unverdicted · none · ref 7
Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.
K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data cs.LG · 2026-04-10 · unverdicted · none · ref 11
K-STEMIT reduces RMSE by 21% for subsurface stratigraphy thickness estimation from radar data via a knowledge-informed spatio-temporal GNN with adaptive feature fusion and physical priors from the MAR weather model.
YaRN: Efficient Context Window Extension of Large Language Models cs.CL · 2023-08-31 · unverdicted · none · ref 6
YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.
Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 9
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model astro-ph.SR · 2026-04-23 · unverdicted · none · ref 26
A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 9
Pith review generated a malformed one-line summary.

Edouard Grave, Armand Joulin, and Nicolas Usunier

fields

years

verdicts

representative citing papers

citing papers explorer