super hub Mixed citations

write newline

" write newline "" before

Mixed citation behavior. Most common role is unclear (62%).

301 Pith papers citing it

unclear 62% of classified citations

browse 301 citing papers more from " write newline "" before

hub tools

JSON dossier citing papers JSON

citation-role summary

background 8 other 4 method 1

citation-polarity summary

unclear 8 background 4 use method 1

claims ledger

background Table A1: Comparison of BAS for frontier models across tasks when varying the risk-prior w(t). Higher scores indicate better alignment with expressed uncertainty. The standardBAS (Uniform: w(t) = 1) serves as the baseline, while Linear and Quadratic weights simulate increasingly safety-critical environments. Identical ECE, different BAS.Consider two models evaluated on four examples with correctness labelsZ= [1, 1, 0, 0]. The models produce the following confidence values: Example 1 2 3 4 Z1 1 0

authors

" write newline "" before

co-cited works

representative citing papers

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

cs.CL · 2026-04-29 · unverdicted · novelty 8.0

TIDE enables the first cross-architecture distillation of dLLMs, improving a 0.6B student by 1.53 average points over baselines when trained from 8B dense and 16B MoE teachers.

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

cs.LG · 2026-04-17 · unverdicted · novelty 8.0

JumpLoRA uses JumpReLU gating to induce adaptive sparsity in LoRA blocks, achieving dynamic parameter isolation that prevents task interference and improves continual learning performance over IncLoRA and ELLA.

Context Over Content: Exposing Evaluation Faking in Automated Judges

cs.AI · 2026-04-16 · conditional · novelty 8.0

LLM judges exhibit up to 9.8 percentage point leniency bias from stakes signaling in prompts, acting implicitly without mentioning it in chain-of-thought.

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

cs.CL · 2026-04-14 · unverdicted · novelty 8.0

InfiniteScienceGym procedurally generates unbounded scientific repositories with exact ground-truth QA pairs to benchmark LLMs on data reasoning, abstention, and tool use without static datasets.

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

cs.LG · 2026-04-13 · unverdicted · novelty 8.0

EnsembleCert and ScaLabelCert enable tighter and exact certificates for neural network robustness against label-flipping attacks by leveraging white-box information and neural tangent kernel equivalence.

Steered LLM Activations are Non-Surjective

cs.AI · 2026-04-10 · unverdicted · novelty 8.0 · 2 refs

Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.

AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

cs.AI · 2026-04-01 · unverdicted · novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Parameterized Hardness of Zonotope Containment and Neural Network Verification

cs.CC · 2025-09-26 · unverdicted · novelty 8.0

The paper proves W[1]-hardness parameterized by dimension d for positivity, zonotope containment, max approximation, and L_p-Lipschitz constants in 2- and 3-layer ReLU networks, showing enumeration methods are optimal under ETH.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

The Coding Limits of Robust Watermarking for Generative Models

cs.CR · 2025-09-11 · accept · novelty 8.0

Establishes an unconditional robustness threshold of 1-1/q for zero-bit tamper-detection codes in watermarking, with matching constructions and experimental confirmation on image models.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

Score-Based Generative Modeling through Stochastic Differential Equations

cs.LG · 2020-11-26 · unverdicted · novelty 8.0

Introduces an SDE-based framework for score-based generative modeling that unifies prior methods, enables predictor-corrector sampling and neural ODE likelihoods, and achieves SOTA unconditional image generation on CIFAR-10.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

cs.LG · 2017-01-23 · accept · novelty 8.0

A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

AutoSP automates sequence parallelism and long-context activation checkpointing via compilation, enabling up to 2.7x longer training contexts on NVIDIA hardware with negligible throughput loss.

Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

cs.AI · 2026-04-28 · conditional · novelty 7.0

C2C is a new testbed where LM agents negotiate differently from humans and targeted prompting raises their win rate from 22.2% to 32.7% across 1,100+ games.

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

cs.AI · 2026-04-27 · unverdicted · novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs

cs.CL · 2026-04-26 · unverdicted · novelty 7.0

GraphPlanner augments multi-agent LLM routing with a heterogeneous graph memory and RL-optimized MDP workflow generation, delivering up to 9.3% higher accuracy and over 99% lower GPU cost than prior routers while supporting zero-shot generalization.

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

Pliable rejection sampling

stat.ML · 2026-04-24 · unverdicted · novelty 7.0

Pliable rejection sampling learns a kernel-based proposal to enable efficient i.i.d. sampling from target distributions f with high-probability correctness and a guarantee on accepted samples.

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

q-bio.NC · 2026-04-23 · unverdicted · novelty 7.0

Stimuli with low intra-modal dispersion among vision models elicit up to twice the cross-modal alignment with language models compared to high-dispersion stimuli.

citing papers explorer

Showing 27 of 27 citing papers after filters.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection cs.CL · 2024-10-06 · unverdicted · none · ref 89
ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
BEAVER: An Enterprise Benchmark for Text-to-SQL cs.CL · 2024-09-03 · unverdicted · none · ref 17
BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.
Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation cs.CL · 2024-12-19 · unverdicted · none · ref 42
S^2-Bench is a new one-to-many benchmark for natural language-driven molecule generation with three tasks, and OpenMolIns is an instruction dataset enabling Llama3.1-8B to outperform GPT-4o and Claude-3.5 on it.
Tighter Performance Theory of FedExProx math.OC · 2024-10-20 · unverdicted · none · ref 42
New analysis framework yields tighter linear convergence for FedExProx on non-strongly convex quadratics and PL functions, proving outperformance over GD once communication costs are counted.
Power-Softmax: Towards Secure LLM Inference over Encrypted Data cs.LG · 2024-10-12 · unverdicted · none · ref 42
Power-Softmax is a new HE-compatible attention variant that permits training and inference of billion-parameter polynomial LLMs with performance matching standard transformers.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering cs.CL · 2024-10-09 · unverdicted · none · ref 33
MLE-bench evaluates frontier language models as ML engineering agents on 75 Kaggle competitions, with the top setup (o1-preview + AIDE) reaching bronze medal level in 16.9% of tasks.
ImProver: Agent-Based Automated Proof Optimization cs.AI · 2024-10-07 · unverdicted · none · ref 1
ImProver is an LLM agent using Chain-of-States, error-correction, and retrieval to rewrite Lean proofs for arbitrary user-defined optimization criteria like shortness and readability.
What Causes Performance Degradation in Cross-Subject EEG Classification? cs.CE · 2024-10-04 · unverdicted · none · ref 1
Controlled experiments attribute cross-subject EEG classification degradation to inter-subject variability in multi-class tasks and shortcut learning in single-class tasks.
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form cs.LG · 2024-08-29 · unverdicted · none · ref 1
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation cs.LG · 2024-12-26 · unverdicted · none · ref 54
SyMerge merges models via single-layer adaptation and expert-guided self-labeling to achieve task synergy, reporting SOTA results on vision, dense prediction, and NLP tasks.
Score-matching-based Structure Learning for Temporal Data on Networks stat.ML · 2024-12-10 · unverdicted · none · ref 69
PICK adds a parent-finding subroutine for leaf nodes to speed up pruning in score-matching causal discovery, extending it from i.i.d. data to static and temporal network data.
Improving Music Source Separation with Diffusion and Consistency Refinement cs.SD · 2024-12-09 · unverdicted · none · ref 1
Diffusion-based refinement followed by consistency distillation improves music source separation quality and inference speed across U-Net and BS-RoFormer backbones on Slakh2100 and MUSDB18.
Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies cs.AI · 2024-12-03 · unverdicted · none · ref 1
PGT optimizes latent goal embeddings for frozen policies via trajectory-level preference objectives, reporting 72-81.6% relative gains on 17 Minecraft tasks and 13.4% better OOD performance than fine-tuning.
Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization cs.LG · 2024-10-25 · unverdicted · none · ref 46
Diversity-regularized DPO fine-tuning of ProteinMPNN improves structural similarity scores by at least 8% over base model and sequence diversity by up to 20% over standard DPO for peptide inverse folding on OpenFold structures.
EventFlow: Forecasting Temporal Point Processes with Flow Matching cs.LG · 2024-10-09 · unverdicted · none · ref 1
EventFlow applies flow matching to learn joint distributions over event times for temporal point processes, reporting 20-53% lower forecast error than autoregressive baselines on standard TPP benchmarks with fewer sampling calls.
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct cs.LG · 2024-10-02 · unverdicted · none · ref 17
Llama3-8b-Instruct recognizes its own outputs via a residual-stream vector associated with self-authorship that can be steered to control authorship claims and perceptions.
Deep Learning Alternatives of the Kolmogorov Superposition Theorem cs.LG · 2024-10-02 · unverdicted · none · ref 1
ActNet is a new KST-based neural network that outperforms KANs and competes with MLPs in PINN benchmarks for PDE simulation tasks.
Safe Bayesian Optimization for Complex Control Systems via Additive Gaussian Processes cs.RO · 2024-08-29 · unverdicted · none · ref 38
SafeCtrlBO combines additive GP kernels with boundary-based safe-set expansion to achieve efficient safe optimization of multi-loop controllers on benchmarks and a PMSM hardware platform.
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection cs.LG · 2024-02-27 · unverdicted · none · ref 84
ConjNorm reframes OOD detection score design as optimizing norm p in an exponential family density model via a Bregman divergence theorem, with a tractable Monte Carlo estimator, claiming SOTA gains on CIFAR-100 and ImageNet-1K.
Test-Time Alignment via Hypothesis Reweighting cs.LG · 2024-12-11 · unverdicted · none · ref 1
HyRe personalizes reward models at test time by reweighting an ensemble of heads trained on aggregate preferences, using few target examples to outperform uniform averaging and prior methods on RewardBench and 32 tasks.
Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-Tuning cs.CL · 2024-12-03 · unverdicted · none · ref 67
Uncertainty-aware fine-tuning with a decision-theory-based loss produces better-calibrated uncertainty estimates than standard training on free-form QA tasks.
TOAST: Transformer Optimization using Adaptive and Simple Transformations cs.LG · 2024-10-07 · unverdicted · none · ref 50
TOAST approximates full transformer blocks in pretrained models via lightweight closed-form mappings to cut parameters and FLOPs without retraining or finetuning.
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback cs.CL · 2024-08-28 · unverdicted · none · ref 45
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
Leveraging Ensemble-Based Semi-Supervised Learning for Illicit Account Detection in Ethereum DeFi Transactions cs.SI · 2024-12-03 · unverdicted · none · ref 49
SLEID combines Isolation Forest and iterative self-training to detect illicit accounts in large-scale Ethereum DeFi transactions, achieving better precision and F1 than baselines while using less labeled data.
Detecting Popular Social Events through Limited Observation with Deep Survival Analysis cs.SI · 2024-10-02 · unverdicted · none · ref 13
A deep survival analysis method predicts popular social media cascades using limited early observation data tested on Twitter, Weibo, and Digg datasets.
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model cs.CL · 2024-04-03 · unverdicted · none · ref 30
Four MAFT-based PLMs for Angolan languages report 12.3-point gains over AfroXLMR-base and 3.8-point gains over OFA baselines on downstream tasks.
Sinc Kolmogorov-Arnold network and its application for solving PDEs with singularities cs.LG · 2024-10-05 · unreviewed · ref 57

write newline

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer