super hub Mixed citations

write newline

" write newline "" before

Mixed citation behavior. Most common role is unclear (62%).

301 Pith papers citing it

unclear 62% of classified citations

browse 301 citing papers more from " write newline "" before

hub tools

JSON dossier citing papers JSON

citation-role summary

background 8 other 4 method 1

citation-polarity summary

unclear 8 background 4 use method 1

claims ledger

background Table A1: Comparison of BAS for frontier models across tasks when varying the risk-prior w(t). Higher scores indicate better alignment with expressed uncertainty. The standardBAS (Uniform: w(t) = 1) serves as the baseline, while Linear and Quadratic weights simulate increasingly safety-critical environments. Identical ECE, different BAS.Consider two models evaluated on four examples with correctness labelsZ= [1, 1, 0, 0]. The models produce the following confidence values: Example 1 2 3 4 Z1 1 0

authors

" write newline "" before

co-cited works

representative citing papers

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

cs.CL · 2026-04-29 · unverdicted · novelty 8.0

TIDE enables the first cross-architecture distillation of dLLMs, improving a 0.6B student by 1.53 average points over baselines when trained from 8B dense and 16B MoE teachers.

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

cs.LG · 2026-04-17 · unverdicted · novelty 8.0

JumpLoRA uses JumpReLU gating to induce adaptive sparsity in LoRA blocks, achieving dynamic parameter isolation that prevents task interference and improves continual learning performance over IncLoRA and ELLA.

Context Over Content: Exposing Evaluation Faking in Automated Judges

cs.AI · 2026-04-16 · conditional · novelty 8.0

LLM judges exhibit up to 9.8 percentage point leniency bias from stakes signaling in prompts, acting implicitly without mentioning it in chain-of-thought.

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

cs.CL · 2026-04-14 · unverdicted · novelty 8.0

InfiniteScienceGym procedurally generates unbounded scientific repositories with exact ground-truth QA pairs to benchmark LLMs on data reasoning, abstention, and tool use without static datasets.

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

cs.LG · 2026-04-13 · unverdicted · novelty 8.0

EnsembleCert and ScaLabelCert enable tighter and exact certificates for neural network robustness against label-flipping attacks by leveraging white-box information and neural tangent kernel equivalence.

Steered LLM Activations are Non-Surjective

cs.AI · 2026-04-10 · unverdicted · novelty 8.0 · 2 refs

Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.

AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

cs.AI · 2026-04-01 · unverdicted · novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Parameterized Hardness of Zonotope Containment and Neural Network Verification

cs.CC · 2025-09-26 · unverdicted · novelty 8.0

The paper proves W[1]-hardness parameterized by dimension d for positivity, zonotope containment, max approximation, and L_p-Lipschitz constants in 2- and 3-layer ReLU networks, showing enumeration methods are optimal under ETH.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

The Coding Limits of Robust Watermarking for Generative Models

cs.CR · 2025-09-11 · accept · novelty 8.0

Establishes an unconditional robustness threshold of 1-1/q for zero-bit tamper-detection codes in watermarking, with matching constructions and experimental confirmation on image models.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

Score-Based Generative Modeling through Stochastic Differential Equations

cs.LG · 2020-11-26 · unverdicted · novelty 8.0

Introduces an SDE-based framework for score-based generative modeling that unifies prior methods, enables predictor-corrector sampling and neural ODE likelihoods, and achieves SOTA unconditional image generation on CIFAR-10.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

cs.LG · 2017-01-23 · accept · novelty 8.0

A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

AutoSP automates sequence parallelism and long-context activation checkpointing via compilation, enabling up to 2.7x longer training contexts on NVIDIA hardware with negligible throughput loss.

Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

cs.AI · 2026-04-28 · conditional · novelty 7.0

C2C is a new testbed where LM agents negotiate differently from humans and targeted prompting raises their win rate from 22.2% to 32.7% across 1,100+ games.

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

cs.AI · 2026-04-27 · unverdicted · novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs

cs.CL · 2026-04-26 · unverdicted · novelty 7.0

GraphPlanner augments multi-agent LLM routing with a heterogeneous graph memory and RL-optimized MDP workflow generation, delivering up to 9.3% higher accuracy and over 99% lower GPU cost than prior routers while supporting zero-shot generalization.

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

Pliable rejection sampling

stat.ML · 2026-04-24 · unverdicted · novelty 7.0

Pliable rejection sampling learns a kernel-based proposal to enable efficient i.i.d. sampling from target distributions f with high-probability correctness and a guarantee on accepted samples.

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

q-bio.NC · 2026-04-23 · unverdicted · novelty 7.0

Stimuli with low intra-modal dispersion among vision models elicit up to twice the cross-modal alignment with language models compared to high-dispersion stimuli.

citing papers explorer

Showing 50 of 118 citing papers after filters.

Parameterized Hardness of Zonotope Containment and Neural Network Verification cs.CC · 2025-09-26 · unverdicted · none · ref 50
The paper proves W[1]-hardness parameterized by dimension d for positivity, zonotope containment, max approximation, and L_p-Lipschitz constants in 2- and 3-layer ReLU networks, showing enumeration methods are optimal under ETH.
RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks cs.CR · 2025-09-25 · conditional · none · ref 46
RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
The Coding Limits of Robust Watermarking for Generative Models cs.CR · 2025-09-11 · accept · none · ref 1
Establishes an unconditional robustness threshold of 1-1/q for zero-bit tamper-detection codes in watermarking, with matching constructions and experimental confirmation on image models.
PerfCoder: Large Language Models for Interpretable Code Performance Optimization cs.SE · 2025-12-16 · unverdicted · none · ref 49
PerfCoder is a family of LLMs trained on optimization trajectories with human annotations and runtime-based preference alignment that achieves higher runtime speedups and optimization rates on the PIE benchmark than prior models while producing interpretable feedback.
SAQ: Stabilizer-Aware Quantum Error Correction Decoder quant-ph · 2025-12-09 · unverdicted · none · ref 1
A dual-stream transformer decoder with constraint-aware post-processing achieves error thresholds of 10.99% and 18.6% on toric codes, approaching ML bounds while scaling linearly.
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction cs.LG · 2025-12-07 · unverdicted · none · ref 61
OXtal recovers experimental organic crystal structures with conformer RMSD below 0.5 Å and over 80% packing similarity using a lattice-free diffusion model trained on 600K structures.
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models cs.LG · 2025-12-02 · unverdicted · none · ref 1
F2D2 jointly distills sampling and likelihood computation in flow-based models by adding a divergence head to a few-step flow map, achieving accurate log-likelihoods at 2-10 NFEs while preserving sample quality.
SAM 3: Segment Anything with Concepts cs.CV · 2025-11-20 · unverdicted · none · ref 1
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models cs.CL · 2025-11-13 · conditional · none · ref 1
MTR-DuplexBench is a multi-round benchmark for full-duplex speech language models that evaluates turn consistency, dialogue quality, instruction following, and safety.
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models cs.CL · 2025-11-11 · unverdicted · none · ref 1
Think-at-Hard selectively triggers latent iterations only on hard tokens via a neural decider and depth-aware LoRA, yielding 3.8-6.8% gains over baselines on nine reasoning benchmarks while iterating on just 7% of tokens.
Score-based Membership Inference on Diffusion Models cs.LG · 2025-09-29 · unverdicted · none · ref 49
Presents SimA, a score-based single-query membership inference attack for diffusion models and LDMs that uses denoiser output norm to reveal training set proximity and outperforms multi-query baselines on eight datasets.
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance cs.SD · 2025-09-28 · unverdicted · none · ref 76
AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.
ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse cs.LG · 2025-09-27 · unverdicted · none · ref 60
ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.
Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia cs.AI · 2025-09-27 · unverdicted · none · ref 1
Mini-Mafia supplies an analytical model logit(p) = v*(m-d) for mafia win probability in LLM role interactions and uses Bayesian inference to estimate per-model parameters that predict tournament results with 76.6% Brier-score improvement over random.
Transformers Can Learn Connectivity in Some Graphs but Not Others cs.CL · 2025-09-26 · unverdicted · none · ref 1
Transformers learn connectivity on low-dimensional grid graphs but fail on high-dimensional grids or graphs with many disconnected components, with larger models showing better generalization on grids.
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks cs.CV · 2025-09-26 · unverdicted · none · ref 39
Neural-MedBench reveals sharp performance drops in state-of-the-art VLMs on reasoning-intensive neurology tasks compared to conventional classification benchmarks, with reasoning failures dominating errors.
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries cs.SE · 2025-09-26 · unverdicted · none · ref 1
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
LayerNorm Induces Recency Bias in Transformer Decoders cs.CL · 2025-09-25 · unverdicted · none · ref 24
Stacked causal self-attention combined with LayerNorm induces recency bias in Transformer decoders, reversing the earlier-token bias seen in attention alone.
LogitTrace: Detecting Benchmark Contamination via Layerwise Logit Trajectories cs.CL · 2025-09-25 · unverdicted · none · ref 36
LogitTrace detects benchmark contamination by showing that contaminated inputs produce earlier stabilization in layerwise logit trajectories while clean inputs show more gradual accumulation.
Concepts in Motion: Temporal Concept Bottleneck Model for Interpretable Video Classification cs.CV · 2025-09-25 · unverdicted · none · ref 1
MoTIF adds temporal self-attention and automatic VLM-based concept discovery to concept bottleneck models for interpretable video classification, showing gains over prior global CBMs on benchmarks.
Explicit and Effectively Symmetric Schemes for Neural SDEs on Lie Groups cs.LG · 2025-09-24 · unverdicted · none · ref 62
Introduces the first explicit near-reversible integrator for neural SDEs on Lie groups by extending EES schemes with Bazavov's commutator-free lift, achieving better stability and up to 10x memory reduction on manifold benchmarks.
Revisiting Image Manipulation Localization under Realistic Manipulation Scenarios cs.CV · 2025-09-24 · conditional · none · ref 27
RITA models image manipulation localization as ordered sequence prediction with a new benchmark HSIM and HSS metric to handle multi-step editing processes.
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies stat.ML · 2025-09-24 · unverdicted · none · ref 1
Diffusion and flow processes forget dependencies to define valid copulas then learn to remember them for density estimation and sampling, outperforming prior copula methods on complex datasets.
On the Convergence of Muon and Beyond cs.LG · 2025-09-19 · unverdicted · none · ref 1
Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images cs.CV · 2025-09-09 · conditional · none · ref 1
Visual-TableQA is a new open-domain benchmark of rendered table images and complex QA pairs created via multi-LLM collaborative generation, with fine-tuned models showing robust generalization to external tests.
Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation cs.CL · 2025-09-02 · unverdicted · none · ref 19
Top-H decoding is a computationally efficient greedy algorithm for an entropy-constrained mass maximization problem that improves the creativity-coherence trade-off over min-p sampling in LLM text generation.
MetaLint: Easy-to-Hard Generalization for Code Linting cs.SE · 2025-07-15 · unverdicted · none · ref 56
MetaLint uses meta-learning to let models generalize from easy synthetic linting data to hard human-curated best practices, yielding large F-score gains on a new PEP-inspired benchmark.
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling cs.LG · 2025-07-02 · unverdicted · none · ref 52
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions cs.LG · 2025-12-24 · unverdicted · none · ref 38
GLiBRL uses GLMs with learnable basis functions for exact Bayesian inference in deep BRL, derives a closed-form link between L2 task distances and kernel task similarity, and reports up to 1.8x gains over prior meta-RL on MuJoCo and MetaWorld.
FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows cs.LG · 2025-12-17 · unverdicted · none · ref 1
FlowBind enables efficient any-to-any multimodal generation via a shared latent space bridged by modality-specific invertible flows, matching prior quality with up to 6x fewer parameters and 10x faster training.
RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing cs.LG · 2025-12-13 · unverdicted · none · ref 1
RAST-MoE-RL equips RL agents with a regime-aware spatio-temporal MoE encoder that reduces matching delay by 10% and pickup delay by 15% on real Uber data from San Francisco while showing robustness to unseen regimes.
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search stat.ML · 2025-12-10 · conditional · none · ref 1
Beam search for candidate generation in consistency-based UQ for LLMs reduces variance and improves performance over multinomial sampling on six QA datasets, supported by a theoretical lower bound on beam-set probability mass.
Greedy Alignment Principle for Optimizer Selection cs.LG · 2025-12-06 · unverdicted · none · ref 1
The greedy alignment principle formulates optimizer selection as maximizing expected loss drop via inner product with gradient autocorrelation, yielding dynamic momentum rules for SGD and Adam that match or exceed best fixed hyperparameters.
Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates cs.CL · 2025-12-04 · conditional · none · ref 1
SSU mitigates catastrophic forgetting in low-resource LLM target-language adaptation by scoring and column-wise freezing source-critical parameters, reducing source degradation to ~3% versus ~20% for full fine-tuning while matching target performance.
Two-Dimensional Quantization for Geometry-Aware Audio Coding cs.SD · 2025-12-01 · unverdicted · none · ref 1
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models cs.CV · 2025-11-13 · unverdicted · none · ref 23
RUDDER creates a persistent visual anchor by extracting CARD from prefill residuals and modulating its injection via an adaptive Beta Gate, cutting CHAIR_S by 24.4% and CHAIR_i by 23.6% on average across LLaVA, Idefics2, InstructBLIP and Qwen2.5-VL with >96% throughput.
Structured Uncertainty guided Clarification for LLM Agents cs.CL · 2025-11-11 · unverdicted · none · ref 1
Structured uncertainty with EVPI enables more efficient clarification and better training for tool-calling LLM agents on ambiguous tasks.
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression eess.IV · 2025-11-09 · conditional · none · ref 1
Turbo-DDCM accelerates DDCM-based zero-shot image compression by batching noise vectors per step while preserving performance and adding priority-aware and PSNR-targeted variants.
SPECTRA: Spectral Domain-Aware Graph Generation for Imbalanced Molecular Property Regression cs.LG · 2025-11-06 · unverdicted · none · ref 48
SPECTRA improves molecular property regression on underrepresented targets via spectral graph generation with rarity-aware budgeting and Laplacian interpolation, paired with edge-aware Chebyshev GNNs, yielding competitive benchmark performance at lower compute cost.
The Realignment Problem: When Right becomes Wrong in LLMs cs.CL · 2025-11-04 · unverdicted · none · ref 1
TRACE is a three-stage optimization framework that realigns LLMs to new policies by categorizing preference conflicts, scoring impact via bi-level optimization, and applying hybrid losses without new human annotations.
Graph-Based Alternatives to LLMs for Human Simulation cs.CL · 2025-11-03 · conditional · none · ref 102
GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models cs.CV · 2025-11-01 · unverdicted · none · ref 1
A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.
DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models cs.LG · 2025-10-31 · unverdicted · none · ref 53
DeepThinkVLA shows CoT improves VLA models only under decoding and causal alignment, delivering 97% success on LIBERO and 21.7-point gains via hybrid attention and SFT-RL training.
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning cs.CL · 2025-09-30 · unverdicted · none · ref 1
KG-R1 trains a single RL agent to retrieve from and reason over knowledge graphs in one loop, achieving higher accuracy with fewer tokens than multi-module baselines and transferring to unseen graphs.
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models cs.LG · 2025-09-30 · unverdicted · none · ref 1
TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.
SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP cs.CV · 2025-09-30 · unverdicted · none · ref 25
SeMoBridge projects images into the text modality via a semantic bridge to reduce CLIP's intra-modal misalignment and improve few-shot performance.
Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search cs.AI · 2025-09-30 · conditional · none · ref 21
Chain-in-Tree cuts token use, model calls, and runtime by 75-85% in LLM tree search on GSM8K and Math500 by using simple branching-necessity checks, with little accuracy loss in most cases.
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training cs.AI · 2025-09-30 · unverdicted · none · ref 57
Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific
SynthPert: Enhancing LLM Biological Reasoning via Synthetic Reasoning Traces for Cellular Perturbation Prediction cs.AI · 2025-09-29 · unverdicted · none · ref 1
SynthPert fine-tunes LLMs using synthetic reasoning traces to reach state-of-the-art on the PerturbQA benchmark for cellular perturbation prediction, surpassing the generating frontier model while generalizing to unseen cell types with only 2% of filtered data.
Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents cs.CV · 2025-09-29 · unverdicted · none · ref 42
CogniGPT uses an interactive loop between a Multi-Granular Perception Agent and an Active Verification Agent to identify reliable clues in long videos with high accuracy and low frame usage.

write newline

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer