LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
hub
EDA : Easy data augmentation techniques for boosting performance on text classification tasks
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 11verdicts
UNVERDICTED 11representative citing papers
LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.
A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.
A context-aware synthetic augmentation framework with a hybrid clinical-language model improves psychological defense mechanism classification to 58.26% accuracy and 24.62% macro-F1 in low-resource conditions, outperforming the DMRS Co-Pilot baseline.
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
LinguIUTics team applies QLoRA fine-tuning of Qwen3-8B plus stratified CV, minority lexical augmentation, logit bias tuning and ensemble blending to achieve 0.3917 macro F1 (7.7 points above Ministral-8B baseline) on PsyDefDetect 2026.
citing papers explorer
-
Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
-
Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions
LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.
-
Learning Perturbations to Extrapolate Your LLM
A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.
-
Perturbation is All You Need for Extrapolating Language Models
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
-
Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
-
Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining
ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
-
Learning Generalizable Multimodal Representations for Software Vulnerability Detection
MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.
-
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.
-
Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation
A context-aware synthetic augmentation framework with a hybrid clinical-language model improves psychological defense mechanism classification to 58.26% accuracy and 24.62% macro-F1 in low-resource conditions, outperforming the DMRS Co-Pilot baseline.
-
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
-
LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification
LinguIUTics team applies QLoRA fine-tuning of Qwen3-8B plus stratified CV, minority lexical augmentation, logit bias tuning and ensemble blending to achieve 0.3917 macro F1 (7.7 points above Ministral-8B baseline) on PsyDefDetect 2026.