super hub Mixed citations

write newline

" write newline "" before

Mixed citation behavior. Most common role is background (50%).

192 Pith papers citing it

Background 50% of classified citations

browse 192 citing papers more from " write newline "" before

hub tools

JSON dossier citing papers JSON

citation-role summary

background 9 other 5 method 2

citation-polarity summary

background 8 unclear 6 use method 2

claims ledger

background Flesch-Kincaid Grade Level 8.97 9.08 -0.11 -0.1673 -0.1528 Table 5: Textual complexity metrics and their correlation with frequency. Corr. denotes correlation. We use nlp = spacy.load("en_core_web_sm") for calculation. Bin Range N BLEU(HF) BLEU(LF)∆BLEU(HF-LF) chrF(HF) chrF(LF)∆chrF(HF-LF) Strict Depth Match 144 20.82 16.04 +4.78 48.73 43.86 +4.87 [0%,5%) 144 20.82 16.04 +4.78 48.73 43.86 +4.87 [5%,10%) 6 22.45 14.79 +7.65 49.76 49.19 +0.57 [10%,15%) 71 19.12 15.38 +3.74 46.19 44.71 +1.47 [15%,2

authors

" write newline "" before

co-cited works

representative citing papers

SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents

cs.CL · 2025-12-08 · accept · novelty 8.0

SwissGov-RSD is the first naturalistic cross-lingual document-level benchmark with human token-level semantic difference annotations, on which both LLMs and encoders show a large performance gap relative to simpler settings.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

A rule-generation perspective lets LLMs write programs as rules for data mapping and applies complexity theory to estimate their compositionality, tested on string-to-grid tasks.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support

cs.CL · 2026-04-28 · unverdicted · novelty 7.0 · 2 refs

Cross-cultural survey of 4,641 participants shows LLM emotional support adoption varies widely by country and demographics, with socioeconomic status as strongest predictor of trust and use, and English-speaking nations more accepting than others in Europe.

PushupBench: Your VLM is not good at counting pushups

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

VLMs reach only 42.1% exact accuracy on counting pushups in videos, with weaker models exploiting modal counts, and 1k-sample fine-tuning transfers gains to MVBench, PerceptionTest, and TVBench.

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

cs.CL · 2026-04-25 · conditional · novelty 7.0 · 2 refs

A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.

StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning

cs.AI · 2026-04-25 · unverdicted · novelty 7.0

StoryTR is a new benchmark and agentic data pipeline that adds explicit Theory of Mind reasoning chains to train smaller video retrieval models, yielding a 15% relative IoU gain over larger baselines on narrative content.

Evaluating Temporal Consistency in Multi-Turn Language Models

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

BERAG applies Bayesian ensemble weighting of individual documents via token-by-token posterior updates in retrieval-augmented generation, yielding gains on knowledge-based visual QA tasks.

How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

Subword tokenization impairs phonological knowledge encoding in LMs, but an IPA-based fine-tuning method restores it with minimal impact on other capabilities.

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

BiasedTales-ML provides a parallel multilingual corpus of LLM-generated children's stories that reveals substantial cross-lingual differences in narrative attributes not captured by English-centric analyses.

Conjunctive Prompt Attacks in Multi-Agent LLM Systems

cs.MA · 2026-04-17 · unverdicted · novelty 7.0

Conjunctive prompt attacks split adversarial elements across agents and routing paths in multi-agent LLM systems, evading isolated defenses and succeeding through topology-aware optimization.

VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

VisPCO uses continuous relaxation, straight-through estimators, and budget-aware Pareto-frontier learning to automatically discover optimal visual token pruning configurations that approximate grid-search results across VLMs and benchmarks.

HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization

cs.SE · 2026-04-16 · unverdicted · novelty 7.0

HintPilot synthesizes semantics-preserving compiler hints via retrieval-augmented LLM generation and profiling-guided refinement, delivering up to 6.88x geometric mean speedup over -Ofast on PolyBench and HumanEval-CPP while preserving correctness.

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

cs.CR · 2026-04-16 · unverdicted · novelty 7.0

R²A uses a hybrid ensemble surrogate router and suffix optimization to significantly increase black-box LLM router selection of expensive models across query distributions.

ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

ADAPT augments planners with affordance reasoning to raise task success in environments with unspecified and time-varying object affordances, and a LoRA-finetuned VLM backend beats GPT-4o on the new DynAfford benchmark.

Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

Schema-key wording functions as an implicit instruction channel under constrained decoding, with experiments showing that rephrasing only the keys can substantially change accuracy on math benchmarks while prompt, model, structure, and decoding remain unchanged.

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.

Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge

cs.IR · 2026-04-15 · unverdicted · novelty 7.0

CAR is a new retrieval objective that targets the currently active authority set rather than most-similar documents, with theorems on coverage conditions and evaluations showing two-stage methods outperform dense retrieval on authority-governed datasets.

Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

Multimodal ICL lags text-only ICL in few-shot settings due to weak cross-modal reasoning alignment and unreliable task mapping transfer, with an inference-stage method proposed to strengthen transfer.

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Reinforcement learning with a multi-part reward teaches LLMs to output independent, meaning-preserving sentence edits that raise argument appropriateness close to full rewriting.

Calibrated Confidence Estimation for Tabular Question Answering

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Tabular QA LLMs are overconfident, but Multi-Format Agreement using Markdown/HTML/JSON/CSV variants improves AUROC to 0.80 and cuts calibration error by 44-63% at lower cost than sampling.

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

EgoEsportsQA is a new egocentric video QA benchmark from esports matches that shows state-of-the-art Video-LLMs reach only 71.58% accuracy and struggle more with tactical reasoning than basic perception.

citing papers explorer

Showing 50 of 192 citing papers.

Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue cs.CL · 2026-04-25 · unverdicted · none · ref 30
VLK-RL verifies LLM-derived constraints and maps them into structured state representations to improve RL performance on long-horizon cross-domain dialogue tasks.
Mixture of Heterogeneous Grouped Experts for Language Modeling cs.CL · 2026-04-25 · unverdicted · none · ref 28
MoHGE achieves standard MoE performance with 20% fewer parameters and balanced GPU utilization via grouped heterogeneous experts, two-level routing, and specialized auxiliary losses.
AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs cs.CL · 2026-04-24 · unverdicted · none · ref 2
AutoPyVerifier learns compact sets of executable Python verifiers from labeled LLM outputs via LLM synthesis and DAG search, improving objective prediction by up to 55 F1 points and downstream LLM accuracy by up to 17 points.
SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning cs.LG · 2026-04-24 · unverdicted · none · ref 27
SOLAR-RL assigns dense step-level rewards from static trajectory data by detecting first failure points and applying target-aligned shaping to improve long-horizon GUI task completion without full online interactions.
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents cs.CR · 2026-04-24 · unverdicted · none · ref 2
RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models cs.CL · 2026-04-24 · unverdicted · none · ref 39
Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.
SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models cs.CL · 2026-04-18 · unverdicted · none · ref 2
SPS interleaves RL and IRL to counteract probability squeezing in LLM reasoning trajectories, improving Pass@k on five benchmarks while identifying an empirical upper bound on multi-sample performance.
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models cs.AI · 2026-04-18 · unverdicted · none · ref 2
Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.
No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation cs.CL · 2026-04-17 · unverdicted · none · ref 28
NWCAD uses a two-stream setup with a two-stage gate to prevent accuracy drops on baseline-correct items under non-informative contexts while retaining gains from helpful contexts.
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 51
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization cs.CL · 2026-04-17 · unverdicted · none · ref 46
CiPO removes undesired knowledge from both intermediate reasoning steps and final answers in large reasoning models by iteratively optimizing preferences toward valid counterfactual traces while keeping overall reasoning performance intact.
GroupDPO: Memory efficient Group-wise Direct Preference Optimization cs.CL · 2026-04-17 · unverdicted · none · ref 2
GroupDPO decouples group-wise preference optimization during backpropagation to cut peak memory while keeping the same gradients, allowing larger groups and consistent gains over single-pair DPO plus an NLL term on positives.
Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation cs.CV · 2026-04-16 · unverdicted · none · ref 76
A new SLT framework uses latent thoughts as a middle reasoning layer and plan-then-ground decoding to improve coherence and faithfulness in gloss-free sign language translation.
CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations cs.AI · 2026-04-16 · unverdicted · none · ref 2
CAMO automates causal discovery in LLM agent simulations by converting hypotheses to computable factors, learning minimal causal subgraphs around an emergent target, and using internal counterfactual probing to orient edges.
MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation cs.AI · 2026-04-16 · unverdicted · none · ref 38
MARS² integrates multi-agent collaboration with tree-structured search in RL to boost code generation by increasing exploratory diversity and using path-level group advantages for credit assignment.
The Autocorrelation Blind Spot: Why 42% of Turn-Level Findings in LLM Conversation Analysis May Be Spurious cs.CL · 2026-04-15 · accept · none · ref 2
42% of significant turn-level associations in LLM conversation analysis are spurious due to unaccounted autocorrelation, with a validated two-stage correction framework improving replication.
MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging cs.CL · 2026-04-15 · unverdicted · none · ref 87
MedRCube is a new fine-grained evaluation framework that benchmarks 33 MLLMs on medical imaging, ranks Lingshu-32B highest, and finds a significant positive link between shortcut behaviors and diagnostic performance.
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning cs.LG · 2026-04-15 · unverdicted · none · ref 2
ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.
MetFuse: Figurative Fusion between Metonymy and Metaphor cs.CL · 2026-04-14 · unverdicted · none · ref 2
MetFuse provides the first dataset of 1,000 meaning-aligned quadruplets fusing literal, metonymic, metaphoric, and hybrid sentences, which augments training to boost metonymy and metaphor classification performance on benchmarks.
Calibration-Aware Policy Optimization for Reasoning LLMs cs.LG · 2026-04-14 · unverdicted · none · ref 51
CAPO improves LLM calibration by up to 15% while matching or exceeding GRPO accuracy through logistic AUC loss and noise masking, enabling better abstention and scaling performance.
Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints cs.AI · 2026-04-14 · unverdicted · none · ref 2
Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents cs.AI · 2026-04-14 · unverdicted · none · ref 2
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
BlasBench: An Open Benchmark for Irish Speech Recognition cs.CL · 2026-04-12 · conditional · none · ref 34
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
Expect the Unexpected? Testing the Surprisal of Salient Entities cs.CL · 2026-04-12 · unverdicted · none · ref 35
Globally salient entities exhibit higher surprisal and reduce surprisal in surrounding text, refining the UID hypothesis by adding entity salience as a shaping factor.
FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning cs.AI · 2026-04-12 · unverdicted · none · ref 58
FACT-E uses controlled perturbations as an instrumental signal to measure intra-chain faithfulness in CoT reasoning and combines it with answer consistency to select trustworthy trajectories.
NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning cs.CL · 2026-04-12 · unverdicted · none · ref 74
NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization on olfactory tasks.
Radiology Report Generation for Low-Quality X-Ray Images cs.CV · 2026-04-11 · unverdicted · none · ref 2
A dual-loop training strategy with gradient consistency lets vision-language models generate radiology reports from low-quality X-ray images without severe performance loss.
Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA cs.IR · 2026-04-10 · conditional · none · ref 2
Two-hop QA retrieval performance depends on whether the hop-2 entity is in the question or bridge passage, and a simple predicate-based router trained on one dataset transfers to improve R@5 on others.
What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal cs.LG · 2026-04-09 · unverdicted · none · ref 54
Steering vectors for refusal primarily modify the OV circuit in attention, ignore most of the QK circuit, and can be sparsified to 1-10% of dimensions while retaining performance.
SeLaR: Selective Latent Reasoning in Large Language Models cs.CL · 2026-04-09 · unverdicted · none · ref 2
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization cs.CL · 2026-04-09 · unverdicted · none · ref 34
Output-aware EM initialization for codebooks in additive quantization avoids poor optimization basins and yields better 2-bit compressed LLMs across Llama and Qwen models.
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems cs.AI · 2026-04-09 · unverdicted · none · ref 50
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation cs.CL · 2026-04-09 · unverdicted · none · ref 88
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs cs.LG · 2026-04-09 · unverdicted · none · ref 3
Bit-by-Bit achieves stable 2-bit quantization of Llama models via block-wise progressive training and outlier channel splitting, reporting only 2.25 WikiText2 PPL degradation versus full precision while outperforming prior QAT baselines.
ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning cs.IR · 2026-04-09 · unverdicted · none · ref 2
ReRec uses reinforcement fine-tuning with dual-graph reward shaping, reasoning-aware advantage estimation, and online curriculum scheduling to improve LLM reasoning and performance in recommendation tasks.
TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense cs.CR · 2026-04-09 · unverdicted · none · ref 52
TrajGuard detects jailbreaks by tracking how hidden-state trajectories move toward high-risk regions during decoding, achieving 95% defense rate with 5.2 ms/token latency across tested attacks.
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs cs.CL · 2026-04-08 · unverdicted · none · ref 2
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge cs.DC · 2026-04-08 · unverdicted · none · ref 2
ChainFed achieves memory-efficient private LLM fine-tuning on edge devices through sequential layer-by-layer adapter training with dynamic co-tuning, perceptive optimization, and adaptive starting point selection, improving accuracy by up to 46.46%.
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference cs.CL · 2026-04-08 · unverdicted · none · ref 25
StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.
To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs cs.CL · 2026-04-08 · unverdicted · none · ref 2
LLMs propagate misinformation more in lower-resource languages and lower-HDI countries, with input safety classifiers and retrieval-augmented fact-checking showing cross-lingual and regional gaps.
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models cs.LG · 2026-04-07 · unverdicted · none · ref 45
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis cs.CV · 2026-04-07 · unverdicted · none · ref 2
AICA-Bench evaluates 23 VLMs on affective image analysis, identifies weak intensity calibration and shallow descriptions as limitations, and proposes training-free Grounded Affective Tree Prompting to improve performance.
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering cs.CV · 2026-04-07 · unverdicted · none · ref 2
WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning cs.CL · 2026-04-07 · unverdicted · none · ref 50
A hybrid fine-tuning objective using KL divergence for token calibration and Kahneman-Tversky optimization for semantic binding enables LLMs to produce outputs that match desired attribute distributions across repeated prompts.
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge cs.AI · 2026-04-07 · unverdicted · none · ref 55
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
Content Fuzzing for Escaping Information Cocoons on Digital Social Media cs.CL · 2026-04-07 · unverdicted · none · ref 2
ContentFuzz rewrites posts with LLM guidance from stance model confidence to flip machine labels without altering human intent, tested across four models and three datasets in two languages.
DQA: Diagnostic Question Answering for IT Support cs.CL · 2026-04-07 · unverdicted · none · ref 2
DQA maintains persistent diagnostic state and aggregates retrievals at the root-cause level to reach 78.7% success on 150 enterprise IT scenarios versus 41.3% for standard multi-turn RAG while cutting average turns from 8.4 to 3.9.
BridgeRAG: Training-Free Bridge-Conditioned Retrieval for Multi-Hop Question Answering cs.IR · 2026-04-03 · conditional · none · ref 2
BridgeRAG improves training-free multi-hop retrieval by using a bridge-conditioned LLM scorer to rank evidence chains, achieving new best R@5 scores on MuSiQue, 2WikiMultiHopQA, and HotpotQA.
A Taxonomy of Programming Languages for Code Generation cs.CL · 2026-03-31 · accept · none · ref 27
The researchers provide a systematic 4-tier classification of 646 programming languages, quantifying the extreme data scarcity facing over 70% of the world's programming languages in the age of LLMs.
Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs cs.CL · 2026-03-27 · unverdicted · none · ref 2
Hallucination neurons in LLMs are domain-specific, with cross-domain classifiers dropping from AUROC 0.783 within-domain to 0.563 across domains.

write newline

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer