SpanBERT: Improving pre-training by representing and predicting spans

arXiv: · 1907 · arXiv 1907.10529

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

cs.CL · 2020-03-23 · conditional · novelty 8.0

ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.

REALM: Retrieval-Augmented Language Model Pre-Training

cs.CL · 2020-02-10 · accept · novelty 8.0

REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

cs.CL · 2019-10-29 · accept · novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

cs.CL · 2019-09-26 · accept · novelty 7.0

ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

cs.CL · 2019-09-17 · unverdicted · novelty 7.0

Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

cs.CL · 2019-10-09 · accept · novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

cs.CL · 2019-07-26 · accept · novelty 5.0

With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.

PortBERT: Navigating the Depths of Portuguese Language Models

cs.CL · 2026-06-01 · unverdicted · novelty 3.0

PortBERT releases two RoBERTa models for Portuguese that match or beat prior monolingual and multilingual models on translated GLUE/SuperGLUE tasks while reporting training and inference times.

citing papers explorer

Showing 9 of 9 citing papers.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators cs.CL · 2020-03-23 · conditional · none · ref 4
ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.
REALM: Retrieval-Augmented Language Model Pre-Training cs.CL · 2020-02-10 · accept · none · ref 6
REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension cs.CL · 2019-10-29 · accept · none · ref 9
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019-10-23 · unverdicted · none · ref 32
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations cs.CL · 2019-09-26 · accept · none · ref 19
ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism cs.CL · 2019-09-17 · unverdicted · none · ref 12
Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.
HuggingFace's Transformers: State-of-the-art Natural Language Processing cs.CL · 2019-10-09 · accept · none · ref 158
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
RoBERTa: A Robustly Optimized BERT Pretraining Approach cs.CL · 2019-07-26 · accept · none · ref 19
With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.
PortBERT: Navigating the Depths of Portuguese Language Models cs.CL · 2026-06-01 · unverdicted · none · ref 26
PortBERT releases two RoBERTa models for Portuguese that match or beat prior monolingual and multilingual models on translated GLUE/SuperGLUE tasks while reporting training and inference times.

SpanBERT: Improving pre-training by representing and predicting spans

fields

years

verdicts

representative citing papers

citing papers explorer