Title resolution pending

316 Pith papers cite this work. Polarity classification is still indexing.

316 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3 method 1

citation-polarity summary

background 2 unclear 1 use method 1

representative citing papers

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

cs.CL · 2026-04-29 · unverdicted · novelty 8.0

TIDE enables the first cross-architecture distillation of dLLMs, improving a 0.6B student by 1.53 average points over baselines when trained from 8B dense and 16B MoE teachers.

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

cs.LG · 2026-04-17 · unverdicted · novelty 8.0

JumpLoRA uses JumpReLU gating to induce adaptive sparsity in LoRA blocks, achieving dynamic parameter isolation that prevents task interference and improves continual learning performance over IncLoRA and ELLA.

Context Over Content: Exposing Evaluation Faking in Automated Judges

cs.AI · 2026-04-16 · conditional · novelty 8.0

LLM judges exhibit up to 9.8 percentage point leniency bias from stakes signaling in prompts, acting implicitly without mentioning it in chain-of-thought.

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

cs.CL · 2026-04-14 · unverdicted · novelty 8.0

InfiniteScienceGym procedurally generates unbounded scientific repositories with exact ground-truth QA pairs to benchmark LLMs on data reasoning, abstention, and tool use without static datasets.

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

cs.LG · 2026-04-13 · unverdicted · novelty 8.0

EnsembleCert and ScaLabelCert enable tighter and exact certificates for neural network robustness against label-flipping attacks by leveraging white-box information and neural tangent kernel equivalence.

Steered LLM Activations are Non-Surjective

cs.AI · 2026-04-10 · unverdicted · novelty 8.0 · 2 refs

Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.

AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

cs.AI · 2026-04-01 · unverdicted · novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Parameterized Hardness of Zonotope Containment and Neural Network Verification

cs.CC · 2025-09-26 · unverdicted · novelty 8.0

The paper proves W[1]-hardness parameterized by dimension d for positivity, zonotope containment, max approximation, and L_p-Lipschitz constants in 2- and 3-layer ReLU networks, showing enumeration methods are optimal under ETH.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

cs.CL · 2024-10-06 · unverdicted · novelty 8.0

ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.

Score-Based Generative Modeling through Stochastic Differential Equations

cs.LG · 2020-11-26 · unverdicted · novelty 8.0

Introduces an SDE-based framework for score-based generative modeling that unifies prior methods, enables predictor-corrector sampling and neural ODE likelihoods, and achieves SOTA unconditional image generation on CIFAR-10.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

cs.LG · 2017-01-23 · accept · novelty 8.0

A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

AutoSP automates sequence parallelism and long-context activation checkpointing via compilation, enabling up to 2.7x longer training contexts on NVIDIA hardware with negligible throughput loss.

VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.

Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

cs.AI · 2026-04-28 · conditional · novelty 7.0

C2C is a new testbed where LM agents negotiate differently from humans and targeted prompting raises their win rate from 22.2% to 32.7% across 1,100+ games.

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

cs.AI · 2026-04-27 · unverdicted · novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs

cs.CL · 2026-04-26 · unverdicted · novelty 7.0

GraphPlanner augments multi-agent LLM routing with a heterogeneous graph memory and RL-optimized MDP workflow generation, delivering up to 9.3% higher accuracy and over 99% lower GPU cost than prior routers while supporting zero-shot generalization.

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

Abstract-CoT lets models reason with short discrete latent token sequences from a reserved vocabulary, using warm-up training and RL to match verbal CoT performance with up to 11.6x fewer tokens.

Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Humans show broad weak directional confusions while DNNs show sparse strong collapses; these structures shift rate-distortion geometry differently and reveal divergent inductive biases.

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

q-bio.NC · 2026-04-23 · unverdicted · novelty 7.0

Stimuli with low intra-modal dispersion among vision models elicit up to twice the cross-modal alignment with language models compared to high-dispersion stimuli.

citing papers explorer

Showing 31 of 31 citing papers after filters.

Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision cs.CV · 2026-04-23 · unverdicted · none · ref 36
Humans show broad weak directional confusions while DNNs show sparse strong collapses; these structures shift rate-distortion geometry differently and reveal divergent inductive biases.
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation cs.CV · 2026-04-15 · conditional · none · ref 2
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models cs.CV · 2026-04-03 · unverdicted · none · ref 28
A procedural engine generates 200k+ synthetic geometry diagrams to fine-tune VLMs for referring image segmentation on abstract diagrams, yielding 49% IoU and 85% Buffered IoU with Florence-2 versus under 1% zero-shot.
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks cs.CV · 2025-09-26 · unverdicted · none · ref 41
Neural-MedBench reveals sharp performance drops in state-of-the-art VLMs on reasoning-intensive neurology tasks compared to conventional classification benchmarks, with reasoning failures dominating errors.
Concepts in Motion: Temporal Concept Bottleneck Model for Interpretable Video Classification cs.CV · 2025-09-25 · unverdicted · none · ref 50
MoTIF adds temporal self-attention and automatic VLM-based concept discovery to concept bottleneck models for interpretable video classification, showing gains over prior global CBMs on benchmarks.
Revisiting Image Manipulation Localization under Realistic Manipulation Scenarios cs.CV · 2025-09-24 · conditional · none · ref 29
RITA models image manipulation localization as ordered sequence prediction with a new benchmark HSIM and HSS metric to handle multi-step editing processes.
VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference cs.CV · 2024-11-28 · unverdicted · none · ref 2
VIPaint uses hierarchical variational inference to optimize a non-Gaussian Markov approximation of the diffusion posterior, enabling better inpainting and inverse problems with pre-trained and latent diffusion models.
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL cs.CV · 2026-04-30 · unverdicted · none · ref 3
PRISM adds a black-box on-policy distillation stage with an MoE discriminator between SFT and RLVR for multimodal models, yielding +4.4 and +6.0 average accuracy gains on 4B and 8B Qwen3-VL models over the standard baseline.
Cross-Stage Coherence in Hierarchical Driving VQA: Explicit Baselines and Learned Gated Context Projectors cs.CV · 2026-04-24 · unverdicted · none · ref 3
Explicit prompt baselines cut NLI contradictions by up to 42.6% with zero training, while learned gated context projectors deliver a 34% reduction in planning-stage contradictions and 50% higher cross-stage entailment on DriveLM-nuScenes.
Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography cs.CV · 2026-04-16 · unverdicted · none · ref 3
LAMAE adds latent-space attention to masked autoencoders so multi-view echocardiography videos can exchange information across frames and views, yielding representations that transfer from adult to pediatric hearts and enable ICD-10 code prediction on MIMIC-IV-ECHO.
UniMark: Unified Adaptive Multi-bit Watermarking for Autoregressive Image Generators cs.CV · 2026-04-12 · unverdicted · none · ref 3
UniMark enables reliable multi-bit watermarking across different autoregressive image generators via adaptive semantic grouping, block-wise encoding with error correction, and a unified token interface.
Is There Knowledge Left to Extract? Evidence of Fragility in Medically Fine-Tuned Vision-Language Models cs.CV · 2026-04-10 · unverdicted · none · ref 3
Medically fine-tuned VLMs exhibit fragile performance that degrades with task difficulty and shows no reliable advantage over general models, with high sensitivity to prompt changes.
MixFlow: Mixed Source Distributions Improve Rectified Flows cs.CV · 2026-04-10 · unverdicted · none · ref 39
Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.
Multimodal Language Models Cannot Spot Spatial Inconsistencies cs.CV · 2026-04-01 · unverdicted · none · ref 4
Multimodal LLMs significantly underperform humans at spotting objects that break 3D consistency in multi-view image pairs.
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models cs.CV · 2026-03-30 · conditional · none · ref 3
Scene Dynamic Field integrates physics simulators into MLLM fine-tuning to boost intuitive physics understanding, delivering up to 20.7% gains on fluid tasks with generalization to unseen domains.
Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models cs.CV · 2025-11-13 · unverdicted · none · ref 25
RUDDER creates a persistent visual anchor by extracting CARD from prefill residuals and modulating its injection via an adaptive Beta Gate, cutting CHAIR_S by 24.4% and CHAIR_i by 23.6% on average across LLaVA, Idefics2, InstructBLIP and Qwen2.5-VL with >96% throughput.
SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP cs.CV · 2025-09-30 · unverdicted · none · ref 27
SeMoBridge projects images into the text modality via a semantic bridge to reduce CLIP's intra-modal misalignment and improve few-shot performance.
Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents cs.CV · 2025-09-29 · unverdicted · none · ref 44
CogniGPT uses an interactive loop between a Multi-Granular Perception Agent and an Active Verification Agent to identify reliable clues in long videos with high accuracy and low frame usage.
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation cs.CV · 2025-09-29 · unverdicted · none · ref 68 · 2 links
Causal-Adapter adapts frozen diffusion backbones via structural causal modeling, prompt-aligned injection, and conditioned token contrastive loss to achieve faithful counterfactual generation with strong attribute control and identity preservation.
Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework cs.CV · 2025-09-27 · unverdicted · none · ref 53
DRP decouples reasoning from perception in LMMs by using an LLM reasoner to query an LMM observer for visual details as needed, reducing visual grounding loss.
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering cs.CV · 2025-08-31 · unverdicted · none · ref 61
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling cs.CV · 2025-07-10 · unverdicted · none · ref 99
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.
Understanding Representation Gaps Across Scales in Tropical Tree Species Classification from Drone Imagery cs.CV · 2026-04-24 · unverdicted · none · ref 3
Close-up UAV images yield higher tree species classification accuracy than top-view imagery, with the gap increasing for rare species, and self-supervised cross-scale alignment is proposed to bridge them for canopy-level monitoring.
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning cs.CV · 2026-04-16 · unverdicted · none · ref 3
AVR trains vision-language models to adaptively select among full reasoning, perception-only, or direct-answer formats using a modified policy optimization method, reducing token use by 50-90% with little accuracy loss.
Context Sensitivity Improves Human-Machine Visual Alignment cs.CV · 2026-04-15 · unverdicted · none · ref 3
Context-sensitive similarity computation from embeddings improves odd-one-out accuracy by up to 15% over context-insensitive baselines for human visual alignment.
I Can't Believe TTA Is Not Better: When Test-Time Augmentation Hurts Medical Image Classification cs.CV · 2026-04-06 · unverdicted · none · ref 3
Test-time augmentation consistently degrades accuracy in medical image classification on MedMNIST v2 benchmarks due to distribution shifts between augmented test inputs and training data.
Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing cs.CV · 2026-03-30 · conditional · none · ref 21
Embedding-only uplink enables flexible onboard retrieval for remote sensing under distribution shifts, with kNN superior for cloud classification and centroids for temporal change detection.
Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting cs.CV · 2025-11-25 · unverdicted · none · ref 2
SPROUT presents a fully training-free prompting framework that constructs histology-informed prototypes, aligns features via partial optimal transport, and generates positive/negative point prompts for SAM to achieve competitive nuclear instance segmentation on histopathology benchmarks.
PartCo: Part-Level Correspondence Priors Enhance Category Discovery cs.CV · 2025-09-26 · unverdicted · none · ref 63
PartCo improves generalized category discovery by incorporating part-level correspondence priors that capture finer semantic structures and integrate with existing GCD methods.
Learning Illumination Control in Diffusion Models cs.CV · 2026-04-27 · unverdicted · none · ref 3
An open-source data engine creates illumination control triplets to fine-tune diffusion models, yielding better perceptual, structural, and identity preservation than SD 1.5, SDXL, and FLUX.1-dev baselines.
Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization cs.CV · 2025-09-25 · unverdicted · none · ref 45
A Person Independence Universal Micro-action Recognition Framework combines Distributionally Robust Optimization with temporal-frequency alignment at the feature level and group-invariant regularization at the loss level to improve generalization across persons on the MA-52 dataset.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer