citation dossier

Alex Graves

Graves, A · 2013 · arXiv 1308.0850

16Pith papers citing it

16reference links

cs.CLtop field · 6 papers

UNVERDICTEDtop verdict bucket · 9 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 16 reviewed papers. Its strongest current cluster is cs.CL (6 papers). The largest review-status bucket among citing papers is UNVERDICTED (9 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

Neural Turing Machines

cs.NE · 2014-10-20 · unverdicted · novelty 8.0

Neural Turing Machines augment neural networks with differentiable external memory to learn algorithmic tasks such as copying, sorting, and associative recall from examples.

Neural Machine Translation by Jointly Learning to Align and Translate

cs.CL · 2014-09-01 · accept · novelty 8.0

An attention-based encoder-decoder model achieves English-to-French translation performance comparable to phrase-based systems by automatically learning soft alignments.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

cs.CL · 2026-05-10 · conditional · novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.

Online Reasoning Video Object Segmentation

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.

Unified Vector Floorplan Generation via Markup Representation

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A single transformer model using a new markup representation generates functional floorplans from diverse conditions and outperforms prior task-specific methods on the RPLAN dataset.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

Anon: Extrapolating Adaptivity Beyond SGD and Adam

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

CASHG: Context-Aware Stylized Online Handwriting Generation

cs.CV · 2026-04-02 · conditional · novelty 6.0

CASHG explicitly models inter-character connectivity with a Character Context Encoder and bigram-aware Transformer decoder to produce style-consistent sentence trajectories, plus a new CSM evaluation metric.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

cs.LG · 2021-04-27 · accept · novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Large Language Models: A Survey

cs.CL · 2024-02-09 · accept · novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

citing papers explorer

Showing 16 of 16 citing papers.

Adaptive Computation Time for Recurrent Neural Networks cs.NE · 2016-03-29 · accept · none · ref 8
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
Neural Turing Machines cs.NE · 2014-10-20 · unverdicted · none · ref 13
Neural Turing Machines augment neural networks with differentiable external memory to learn algorithmic tasks such as copying, sorting, and associative recall from examples.
Neural Machine Translation by Jointly Learning to Align and Translate cs.CL · 2014-09-01 · accept · none · ref 13
An attention-based encoder-decoder model achieves English-to-French translation performance comparable to phrase-based systems by automatically learning soft alignments.
Adam: A Method for Stochastic Optimization cs.LG · 2014-12-22 · accept · none · ref 4
A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models cs.CL · 2026-05-10 · conditional · none · ref 35
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Online Reasoning Video Object Segmentation cs.CV · 2026-04-13 · unverdicted · none · ref 16
The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.
Unified Vector Floorplan Generation via Markup Representation cs.CV · 2026-04-06 · unverdicted · none · ref 7
A single transformer model using a new markup representation generates functional floorplans from diverse conditions and outperforms prior task-specific methods on the RPLAN dataset.
Flamingo: a Visual Language Model for Few-Shot Learning cs.CV · 2022-04-29 · unverdicted · none · ref 32
Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019-10-23 · unverdicted · none · ref 21
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
Anon: Extrapolating Adaptivity Beyond SGD and Adam cs.AI · 2026-05-04 · unverdicted · none · ref 5
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
CASHG: Context-Aware Stylized Online Handwriting Generation cs.CV · 2026-04-02 · conditional · none · ref 13
CASHG explicitly models inter-character connectivity with a Character Context Encoder and bigram-aware Transformer decoder to produce style-consistent sentence trajectories, plus a new CSM evaluation metric.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model cs.CL · 2022-11-09 · unverdicted · none · ref 243
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges cs.LG · 2021-04-27 · accept · none · ref 33
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 11
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 10
Pith review generated a malformed one-line summary.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 16
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

Alex Graves

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer