hub Mixed citations

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning · 2020 · cs.CL · arXiv 2003.10555

Mixed citation behavior. Most common role is background (45%).

32 Pith papers citing it

Background 45% of classified citations

open full Pith review browse 32 citing papers arXiv PDF

abstract

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments demonstrate this new pre-training task is more efficient than MLM because the task is defined over all input tokens rather than just the small subset that was masked out. As a result, the contextual representations learned by our approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale, where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when using the same amount of compute.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 method 3 baseline 1

citation-polarity summary

background 5 use method 3 unclear 2 baseline 1

representative citing papers

Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback

cs.LG · 2025-07-20 · conditional · novelty 7.0

Time-RA reformulates time series anomaly detection as a reasoning-intensive generative task and provides the RATs40K multimodal benchmark to evaluate and improve LLM-based diagnosis.

Hopfield Networks is All You Need

cs.NE · 2020-07-16 · unverdicted · novelty 7.0

Modern Hopfield networks store exponentially many patterns, retrieve them in one update, and have an update rule equivalent to transformer attention, enabling new Hopfield layers that improve results on multiple instance learning and drug design tasks.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

Protein Fold Classification at Scale: Benchmarking and Pretraining

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Introduces TEDBench benchmark and MiAE self-supervised framework that outperforms baselines for large-scale protein fold classification.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

cs.CR · 2026-05-06 · conditional · novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while matching or exceeding it on two text-classification benchmarks and compressing the

Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts

cs.SE · 2026-04-25 · unverdicted · novelty 6.0

A broad empirical benchmark shows how 15 existing test selection metrics perform for fault detection, performance estimation, and retraining under corrupted, adversarial, temporal, natural, and label shifts across image, text, and Android data.

Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

cs.CL · 2026-04-21 · conditional · novelty 6.0

Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.

Entities as Retrieval Signals: A Systematic Study of Coverage, Supervision, and Evaluation in Entity-Oriented Ranking

cs.IR · 2026-04-06 · conditional · novelty 6.0

Entity signals cover only 19.7% of relevant documents on Robust04 and no configuration among 443 systems improves MAP by more than 0.05 in open-world evaluation, despite gains when entities are pre-restricted.

Compiling Code LLMs into Lightweight Executables

cs.SE · 2026-03-31 · conditional · novelty 6.0

Ditto quantizes Code LLMs with K-Means codebooks and compiles inference via LLVM-BLAS replacement to deliver up to 10.5x faster, 6.4x smaller, and 10.5x lower-energy execution on commodity hardware while losing only 0.27% pass@1 accuracy.

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

cs.LG · 2025-02-07 · unverdicted · novelty 6.0

In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.

Secret Leak Detection in Software Issue Reports using LLMs: A Comprehensive Evaluation

cs.SE · 2024-10-31 · accept · novelty 6.0

Creates a 54k-instance benchmark of GitHub issue secrets and shows fine-tuned LLMs reach 94.49% F1 with 81.6% on 178 real repositories.

Demystifying CLIP Data

cs.CV · 2023-09-28 · accept · novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

EVA-CLIP: Improved Training Techniques for CLIP at Scale

cs.CV · 2023-03-27 · conditional · novelty 6.0

EVA-CLIP delivers improved CLIP training recipes that yield 82.0% zero-shot ImageNet-1K accuracy for a 5B-parameter model after only 9 billion samples.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

cs.CL · 2019-10-09 · accept · novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.

Automatic Reflection Level Classification in Hungarian Student Essays

cs.CL · 2026-05-04 · unverdicted · novelty 5.0

Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.

ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

ESsEN is a parameter-efficient two-tower vision-language transformer that matches larger models on discriminative tasks after training end-to-end with limited data and resources.

Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.

A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm

cs.CL · 2025-09-14 · unverdicted · novelty 5.0

Benchmarks five compressed transformer models for multi-platform sentiment classification on 15-minute city discourse, reporting DistilRoBERTa highest F1 of 0.8292 and platform-specific performance differences.

Social media polarization during conflict: Insights from an ideological stance dataset on Israel-Palestine Reddit comments

cs.CL · 2025-02-01 · unverdicted · novelty 5.0

A new labeled dataset of 9,969 Israel-Palestine Reddit comments is created and used to compare stance classification methods, with a specific Mixtral prompt achieving the highest performance.

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

cs.CL · 2024-11-16 · unverdicted · novelty 5.0

Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.

On the Power of Foundation Models

cs.AI · 2022-11-29 · unverdicted · novelty 5.0

Category theory proves prompt-based learning on perfect foundation models works only for representable tasks, fine-tuning solves tasks in the pretext category, and models can represent unseen target-category objects using source-category structure.

Explaining the Explainers in Graph Neural Networks: a Comparative Study

cs.LG · 2022-10-27 · unverdicted · novelty 5.0

Benchmark study of ten GNN explainers on eight architectures and six datasets that isolates usable components and issues practical recommendations.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019-10-23 · unverdicted · none · ref 14 · internal anchor
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
HuggingFace's Transformers: State-of-the-art Natural Language Processing cs.CL · 2019-10-09 · accept · none · ref 150 · internal anchor
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer