Roberta: A robustly optimized bert pretraining approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov · 2019

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Computer Science Conferences Should Require Nonrepudiable Experimental Results

cs.CR · 2026-05-09 · unverdicted · novelty 6.0

CS conferences should require nonrepudiable experimental results via signed attestations that prevent authors from later altering or denying reported numbers.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

cs.CL · 2021-08-27 · unverdicted · novelty 6.0

ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.

citing papers explorer

Showing 3 of 3 citing papers.

LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 35
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Computer Science Conferences Should Require Nonrepudiable Experimental Results cs.CR · 2026-05-09 · unverdicted · none · ref 14
CS conferences should require nonrepudiable experimental results via signed attestations that prevent authors from later altering or denying reported numbers.
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation cs.CL · 2021-08-27 · unverdicted · none · ref 18
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.

Roberta: A robustly optimized bert pretraining approach

fields

years

verdicts

representative citing papers

citing papers explorer