super hub Mixed citations

write newline

" write newline "" before

Mixed citation behavior. Most common role is unclear (62%).

301 Pith papers citing it

unclear 62% of classified citations

browse 301 citing papers more from " write newline "" before

hub tools

JSON dossier citing papers JSON

citation-role summary

background 8 other 4 method 1

citation-polarity summary

unclear 8 background 4 use method 1

claims ledger

background Table A1: Comparison of BAS for frontier models across tasks when varying the risk-prior w(t). Higher scores indicate better alignment with expressed uncertainty. The standardBAS (Uniform: w(t) = 1) serves as the baseline, while Linear and Quadratic weights simulate increasingly safety-critical environments. Identical ECE, different BAS.Consider two models evaluated on four examples with correctness labelsZ= [1, 1, 0, 0]. The models produce the following confidence values: Example 1 2 3 4 Z1 1 0

authors

" write newline "" before

co-cited works

representative citing papers

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

cs.CL · 2026-04-29 · unverdicted · novelty 8.0

TIDE enables the first cross-architecture distillation of dLLMs, improving a 0.6B student by 1.53 average points over baselines when trained from 8B dense and 16B MoE teachers.

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

cs.LG · 2026-04-17 · unverdicted · novelty 8.0

JumpLoRA uses JumpReLU gating to induce adaptive sparsity in LoRA blocks, achieving dynamic parameter isolation that prevents task interference and improves continual learning performance over IncLoRA and ELLA.

Context Over Content: Exposing Evaluation Faking in Automated Judges

cs.AI · 2026-04-16 · conditional · novelty 8.0

LLM judges exhibit up to 9.8 percentage point leniency bias from stakes signaling in prompts, acting implicitly without mentioning it in chain-of-thought.

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

cs.CL · 2026-04-14 · unverdicted · novelty 8.0

InfiniteScienceGym procedurally generates unbounded scientific repositories with exact ground-truth QA pairs to benchmark LLMs on data reasoning, abstention, and tool use without static datasets.

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

cs.LG · 2026-04-13 · unverdicted · novelty 8.0

EnsembleCert and ScaLabelCert enable tighter and exact certificates for neural network robustness against label-flipping attacks by leveraging white-box information and neural tangent kernel equivalence.

Steered LLM Activations are Non-Surjective

cs.AI · 2026-04-10 · unverdicted · novelty 8.0 · 2 refs

Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.

AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

cs.AI · 2026-04-01 · unverdicted · novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Parameterized Hardness of Zonotope Containment and Neural Network Verification

cs.CC · 2025-09-26 · unverdicted · novelty 8.0

The paper proves W[1]-hardness parameterized by dimension d for positivity, zonotope containment, max approximation, and L_p-Lipschitz constants in 2- and 3-layer ReLU networks, showing enumeration methods are optimal under ETH.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

The Coding Limits of Robust Watermarking for Generative Models

cs.CR · 2025-09-11 · accept · novelty 8.0

Establishes an unconditional robustness threshold of 1-1/q for zero-bit tamper-detection codes in watermarking, with matching constructions and experimental confirmation on image models.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

Score-Based Generative Modeling through Stochastic Differential Equations

cs.LG · 2020-11-26 · unverdicted · novelty 8.0

Introduces an SDE-based framework for score-based generative modeling that unifies prior methods, enables predictor-corrector sampling and neural ODE likelihoods, and achieves SOTA unconditional image generation on CIFAR-10.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

cs.LG · 2017-01-23 · accept · novelty 8.0

A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

AutoSP automates sequence parallelism and long-context activation checkpointing via compilation, enabling up to 2.7x longer training contexts on NVIDIA hardware with negligible throughput loss.

Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

cs.AI · 2026-04-28 · conditional · novelty 7.0

C2C is a new testbed where LM agents negotiate differently from humans and targeted prompting raises their win rate from 22.2% to 32.7% across 1,100+ games.

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

cs.AI · 2026-04-27 · unverdicted · novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs

cs.CL · 2026-04-26 · unverdicted · novelty 7.0

GraphPlanner augments multi-agent LLM routing with a heterogeneous graph memory and RL-optimized MDP workflow generation, delivering up to 9.3% higher accuracy and over 99% lower GPU cost than prior routers while supporting zero-shot generalization.

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

Pliable rejection sampling

stat.ML · 2026-04-24 · unverdicted · novelty 7.0

Pliable rejection sampling learns a kernel-based proposal to enable efficient i.i.d. sampling from target distributions f with high-probability correctness and a guarantee on accepted samples.

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

q-bio.NC · 2026-04-23 · unverdicted · novelty 7.0

Stimuli with low intra-modal dispersion among vision models elicit up to twice the cross-modal alignment with language models compared to high-dispersion stimuli.

citing papers explorer

Showing 50 of 301 citing papers.

Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates for Prediction-Powered Causal Inference stat.ML · 2025-11-11 · unverdicted · none · ref 54
Incorporating unlabeled auxiliary covariates lowers the efficiency bound for treatment effect estimation and produces estimators with smaller asymptotic variance than those without the auxiliary data.
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations cs.RO · 2025-11-04 · unverdicted · none · ref 1
XR-1 introduces Unified Vision-Motion Codes learned by dual-branch VQ-VAE and applies them in a three-stage training pipeline to outperform prior VLA models on 120+ real-world manipulation tasks across six robot embodiments.
Semantic-Aware Logical Reasoning via a Semiotic Framework cs.AI · 2025-09-29 · conditional · none · ref 1
LogicAgent uses a semiotic-square-guided approach to enhance logical reasoning in LLMs on the new RepublicQA benchmark and others, reporting average gains of 6.25% and 7.05% respectively.
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge cs.AI · 2025-09-29 · unverdicted · none · ref 1
G-reasoner uses QuadGraph abstraction and a 34M-parameter graph foundation model integrated with LLMs to enable scalable reasoning over diverse graph-structured knowledge, outperforming baselines on six benchmarks.
TusoAI: Agentic Optimization for Scientific Methods cs.AI · 2025-09-28 · unverdicted · none · ref 44
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.
PartCo: Part-Level Correspondence Priors Enhance Category Discovery cs.CV · 2025-09-26 · unverdicted · none · ref 1
PartCo improves generalized category discovery by incorporating part-level correspondence priors that capture finer semantic structures and integrate with existing GCD methods.
We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong cs.CL · 2025-09-26 · unverdicted · none · ref 36
AMBS is a 1-to-N Transformer steering framework that shares a base representation across HHH objectives and restricts divergence during inference to produce consistent multi-objective responses in one forward pass.
What Is The Political Content in LLMs' Pre- and Post-Training Data? cs.CL · 2025-09-26 · unverdicted · none · ref 50
Training data for open LLMs is systematically left-leaning, with pre-training corpora containing more political material than post-training data and model stances aligning with data distributions.
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning cs.AI · 2025-09-26 · unverdicted · none · ref 1
MoVT unifies different visual reasoning modes in a single model and uses the AdaVaR two-stage framework with supervised cold-start and RL via AdaGRPO to enable context-adaptive mode selection, yielding consistent gains on visual reasoning tasks.
Smoothing Binary Optimization: A Primal-Dual Perspective math.OC · 2025-09-25 · unverdicted · none · ref 25
A primal-dual smoothing reformulation converts discrete binary optimization into a continuous minimax problem solved by a convergent simultaneous gradient descent-ascent algorithm.
Failure Modes of Maximum Entropy RLHF cs.LG · 2025-09-24 · unverdicted · none · ref 65
Derives SimPO from MaxEnt RL and reports that MaxEnt RL in online RLHF exhibits frequent overoptimization and unstable KL dynamics across scales, unlike stable KL-constrained baselines.
Incomplete Data, Complete Dynamics: A Diffusion Approach cs.LG · 2025-09-24 · unverdicted · none · ref 53
A conditional diffusion model trained on partitioned incomplete samples for physical dynamics achieves asymptotic convergence to the true generative process under mild conditions and outperforms baselines in imputation.
Position: AI Evaluations Should be Grounded on a Theory of Capability cs.AI · 2025-09-23 · conditional · none · ref 1
AI evaluations should be reframed as inference tasks grounded in an explicit theory of capability, with an empirical demonstration that results depend on modeling assumptions and a proposed Evaluation Card for transparency.
Do Activation Verbalization Methods Convey Privileged Information? cs.CL · 2025-09-16 · unverdicted · none · ref 58
Activation verbalization methods for LLMs largely reflect the verbalizer model's parametric knowledge rather than privileged information from the target model's activations.
Self-Aligned Reward: Towards Effective and Efficient Reasoners cs.LG · 2025-09-05 · unverdicted · none · ref 56
Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.
Fine-tuning Large Language Model for Automated Algorithm Design cs.LG · 2025-07-13 · unverdicted · none · ref 1
Fine-tuned LLMs with DAR sampling and DPO outperform off-the-shelf versions on algorithm design tasks and generalize to related settings.
Large Language Models Can Help Mitigate Barren Plateaus in Quantum Neural Networks quant-ph · 2025-02-17 · unverdicted · none · ref 41
AdaInit uses LLMs with submartingale properties to iteratively synthesize QNN initial parameters that maintain non-negligible gradient variance and mitigate barren plateaus, with claimed theoretical convergence guarantees and empirical outperformance.
Test-Time Alignment via Hypothesis Reweighting cs.LG · 2024-12-11 · unverdicted · none · ref 1
HyRe personalizes reward models at test time by reweighting an ensemble of heads trained on aggregate preferences, using few target examples to outperform uniform averaging and prior methods on RewardBench and 32 tasks.
Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-Tuning cs.CL · 2024-12-03 · unverdicted · none · ref 67
Uncertainty-aware fine-tuning with a decision-theory-based loss produces better-calibrated uncertainty estimates than standard training on free-form QA tasks.
TOAST: Transformer Optimization using Adaptive and Simple Transformations cs.LG · 2024-10-07 · unverdicted · none · ref 50
TOAST approximates full transformer blocks in pretrained models via lightweight closed-form mappings to cut parameters and FLOPs without retraining or finetuning.
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback cs.CL · 2024-08-28 · unverdicted · none · ref 45
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
Instruction-Following Evaluation for Large Language Models cs.CL · 2023-11-14 · unverdicted · none · ref 1
IFEval is a new benchmark of 25 verifiable instruction types and ~500 prompts for objective, reproducible evaluation of LLMs' instruction-following abilities.
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs cs.LG · 2023-09-29 · unverdicted · none · ref 1
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 1
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.
A Study of State Aliasing in Structured Prediction with RNNs cs.LG · 2019-06-21 · unverdicted · none · ref 1
RNN agents trained via policy gradient exhibit state aliasing when multiple states share the same optimal action, unlike value-based methods.
Learning Illumination Control in Diffusion Models cs.CV · 2026-04-27 · unverdicted · none · ref 1
An open-source data engine creates illumination control triplets to fine-tune diffusion models, yielding better perceptual, structural, and identity preservation than SD 1.5, SDXL, and FLUX.1-dev baselines.
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance cs.CV · 2026-04-02 · unverdicted · none · ref 1
VLMs show systematic fragility in visual invariance under geometric transformations, with sharp performance drops as semantic content thins across sketches, photos, and art.
Hermes: A Multi-Scale Spatial-Temporal Hypergraph Network for Stock Time Series Forecasting cs.LG · 2025-09-28 · unverdicted · none · ref 79
Hermes is a multi-scale spatial-temporal hypergraph network that improves stock forecasting accuracy by capturing inter-industry lead-lag dependencies and fusing information across scales.
Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization cs.CV · 2025-09-25 · unverdicted · none · ref 43
A Person Independence Universal Micro-action Recognition Framework combines Distributionally Robust Optimization with temporal-frequency alignment at the feature level and group-invariant regularization at the loss level to improve generalization across persons on the MA-52 dataset.
HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST cs.CL · 2025-09-24 · unverdicted · none · ref 48
HiCoLoRA uses hierarchical LoRA with spectral domain-slot clustering, adaptive fusion, and semantic SVD initialization to achieve SOTA zero-shot DST on MultiWOZ and SGD.
Time Series Forecasting Through the Lens of Dynamics cs.LG · 2025-07-21 · unverdicted · none · ref 1
Proposes dynamics-based analysis of time series models showing partial dynamics learning and end-positioning as key to performance, plus a plug-and-play improvement method.
AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting cs.AI · 2025-02-24 · unverdicted · none · ref 58
An LLM agent with grounding, personalization, and marketing modules generates real estate descriptions that human buyers prefer over expert-written ones while matching factual accuracy.
Leveraging Ensemble-Based Semi-Supervised Learning for Illicit Account Detection in Ethereum DeFi Transactions cs.SI · 2024-12-03 · unverdicted · none · ref 49
SLEID combines Isolation Forest and iterative self-training to detect illicit accounts in large-scale Ethereum DeFi transactions, achieving better precision and F1 than baselines while using less labeled data.
Detecting Popular Social Events through Limited Observation with Deep Survival Analysis cs.SI · 2024-10-02 · unverdicted · none · ref 13
A deep survival analysis method predicts popular social media cascades using limited early observation data tested on Twitter, Weibo, and Digg datasets.
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model cs.CL · 2024-04-03 · unverdicted · none · ref 30
Four MAFT-based PLMs for Angolan languages report 12.3-point gains over AfroXLMR-base and 3.8-point gains over OFA baselines on downstream tasks.
Training on test data: Removing near duplicates in Fashion-MNIST cs.LG · 2019-06-19 · unverdicted · none · ref 7
Near-duplicates exist between Fashion-MNIST train and test sets and a cleaned dataset without them is proposed to reduce artificial accuracy inflation.
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order cs.LG · 2025-12-03 · unverdicted · none · ref 1
Mixed rewards with bootstrapped scaling in GRPO post-training outperform task-only optimization on Zebra puzzles by injecting canonical action order signals.
A Python Library For Empirical Calibration stat.CO · 2019-06-27 · unverdicted · none · ref 30
EC is a Python library that formulates empirical calibration as convex optimization solved in dual form, with added support for multiple objectives, weight clipping, and inexact solutions.
Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction cs.LG · 2026-04-19 · unreviewed · ref 1
Document Optimization for Black-Box Retrieval via Reinforcement Learning cs.CL · 2026-04-06 · unreviewed · ref 1
HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models cs.CL · 2026-03-31 · unreviewed · ref 1
LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks cs.CL · 2026-03-20 · unreviewed · ref 1
Feature Learning Dynamics in Infinite-Depth Neural Networks cs.LG · 2025-12-24 · unreviewed · ref 1
House of Dextra: Cross-embodied Co-design for Dexterous Hands cs.RO · 2025-12-03 · unreviewed · ref 69
SkillWrapper: Generative Predicate Invention for Task-level Robot Planning cs.RO · 2025-11-22 · unreviewed · ref 1
Selective Rotary Position Embedding cs.CL · 2025-11-21 · unreviewed · ref 70
Frictional Q-Learning cs.LG · 2025-09-24 · unreviewed · ref 28
Similarity-Distance-Magnitude Activations cs.LG · 2025-09-16 · unreviewed · ref 31
From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations cs.CL · 2025-07-07 · unreviewed · ref 59
Safety Must Precede the Deployment of Open-Ended AI cs.AI · 2025-02-06 · unreviewed · ref 2

write newline

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer