hub

EDA : Easy data augmentation techniques for boosting performance on text classification tasks

Wei, Jason, Zou, Kai · 2019 · DOI 10.18653/v1/d19-1670

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models

cs.CL · 2026-06-19 · unverdicted · novelty 6.0

LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.

Learning Perturbations to Extrapolate Your LLM

stat.ML · 2026-05-13 · unverdicted · novelty 6.0

A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.

Perturbation is All You Need for Extrapolating Language Models

stat.ML · 2026-05-05 · unverdicted · novelty 6.0

Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.

Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.

Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining

cs.CL · 2026-06-17 · unverdicted · novelty 5.0

ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

cs.SE · 2026-04-28 · unverdicted · novelty 5.0

MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.

Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit

cs.CL · 2026-06-02 · unverdicted · novelty 4.0

Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.

Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

cs.CL · 2026-05-14 · unverdicted · novelty 4.0

A context-aware synthetic augmentation framework with a hybrid clinical-language model improves psychological defense mechanism classification to 58.26% accuracy and 24.62% macro-F1 in low-resource conditions, outperforming the DMRS Co-Pilot baseline.

Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

cs.SE · 2026-04-20 · unverdicted · novelty 4.0

MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

cs.CL · 2026-05-30 · unverdicted · novelty 2.0

LinguIUTics team applies QLoRA fine-tuning of Qwen3-8B plus stratified CV, minority lexical augmentation, logit bias tuning and ensemble blending to achieve 0.3917 macro F1 (7.7 points above Ministral-8B baseline) on PsyDefDetect 2026.

citing papers explorer

Showing 11 of 11 citing papers.

Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models cs.CL · 2026-06-19 · unverdicted · none · ref 180
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions cs.CL · 2026-05-22 · unverdicted · none · ref 37
LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.
Learning Perturbations to Extrapolate Your LLM stat.ML · 2026-05-13 · unverdicted · none · ref 22
A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.
Perturbation is All You Need for Extrapolating Language Models stat.ML · 2026-05-05 · unverdicted · none · ref 53
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing cs.CL · 2026-05-01 · unverdicted · none · ref 16
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining cs.CL · 2026-06-17 · unverdicted · none · ref 46
ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
Learning Generalizable Multimodal Representations for Software Vulnerability Detection cs.SE · 2026-04-28 · unverdicted · none · ref 68
MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit cs.CL · 2026-06-02 · unverdicted · none · ref 38
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.
Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation cs.CL · 2026-05-14 · unverdicted · none · ref 10
A context-aware synthetic augmentation framework with a hybrid clinical-language model improves psychological defense mechanism classification to 58.26% accuracy and 24.62% macro-F1 in low-resource conditions, outperforming the DMRS Co-Pilot baseline.
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning cs.SE · 2026-04-20 · unverdicted · none · ref 92
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification cs.CL · 2026-05-30 · unverdicted · none · ref 8
LinguIUTics team applies QLoRA fine-tuning of Qwen3-8B plus stratified CV, minority lexical augmentation, logit bias tuning and ensemble blending to achieve 0.3917 macro F1 (7.7 points above Ministral-8B baseline) on PsyDefDetect 2026.

EDA : Easy data augmentation techniques for boosting performance on text classification tasks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer