Net-Ev² proposes a two-stage generative simulator with structure-guided masked pre-training and topology-aware diffusion using graph U-Net down/upsampling to model network event evolution from text inputs, plus a new 6.5M multimodal benchmark and JL-MMD metric.
Canonical reference
v24i1.7740
Canonical reference. 84% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
ClaimRAG-LAW is a French-English legal RAG benchmark with claim-level granularity for experts and non-experts that reveals limitations in current retrieval and generation performance.
The paper presents ChildAgentEval as the first psychometrically grounded benchmark comparing MLLM-based agents' reasoning performance to age-specific human cognitive stages.
CAST is a successor-local operator for causal forecasting of simplex-valued time series that retrieves empirical successors from causal context, stabilizes them with a persistence anchor, and applies bounded local stochastic transport while preserving the simplex by construction.
StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.
Image meanings grow more context-dependent with semantic abstraction, requiring narrative grounding for accurate retrieval at higher levels.
IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.
DIPS fine-tunes LLMs to output ordered feasible decision vectors approximating Pareto fronts for constrained bi-objective convex problems, reaching 95-98% normalized hypervolume with 0.16s inference.
Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.
Spiking attention is a universal approximator of permutation-equivariant functions with ε-approximation requiring Ω(L_f² nd / ε²) spikes, but low effective dimensions (47-89) allow T=4 timesteps in practice.
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
IGSTGNN adds incident-context spatial fusion and temporal impact decay modules to model how events alter traffic patterns, achieving state-of-the-art results on a new time-aligned incident-traffic dataset.
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
Distillation from frontier VLMs plus E-RLVR regularization produces a 4B local model that achieves 34.5% SR on OVON while cutting inference latency by 82.8%.
Differential halo zonotopes enable static verification of global robustness in DNNs by jointly propagating pairs of perturbed inputs while bounding divergence, with a relaxed confidence-based variant.
MoCo-AIS is a MoCo-based contrastive learning framework that learns vessel trajectory embeddings and improves similarity computation over baselines on large-scale real-world AIS datasets while offering a benchmarking platform.
CausalMoE is a multimodal foundation model with pattern-routed heterogeneous experts and LLM/VLM integration that claims new SOTA performance on supervised and few-shot Granger causal discovery benchmarks.
Scene-adaptive nonlinear tone curves (ASE and AP3) with percentile normalisation and offset outperform linear gain for pseudo-GT generation in low-light 3DGS, delivering PSNR gains up to 4.34 dB on LOM and 3.25 dB on RealX3D across 21 scenes.
Game-theoretic analysis of diversification in competitive search reveals a diversity-stability tradeoff, with a new method to guarantee corpus equilibrium.
TaRO improves video temporal grounding in MLLMs via constructive reasoning exploration from dense captions and a temporal-sensitivity reward that uses logit drops on disrupted event boundaries, followed by curriculum learning to SOTA results.
citing papers explorer
-
Net-Ev$^2$: A Generative Simulator for Network Event Evolution
Net-Ev² proposes a two-stage generative simulator with structure-guided masked pre-training and topology-aware diffusion using graph U-Net down/upsampling to model network event evolution from text inputs, plus a new 6.5M multimodal benchmark and JL-MMD metric.
-
Fine-grained Claim-level RAG Benchmark for Law
ClaimRAG-LAW is a French-English legal RAG benchmark with claim-level granularity for experts and non-experts that reveals limitations in current retrieval and generation performance.
-
Evaluating Cognitive Age Alignment in Interactive AI Agents
The paper presents ChildAgentEval as the first psychometrically grounded benchmark comparing MLLM-based agents' reasoning performance to age-specific human cognitive stages.
-
CAST: Causal Anchored Simplex Transport for Distribution-Valued Time Series
CAST is a successor-local operator for causal forecasting of simplex-valued time series that retrieves empirical successors from causal context, stabilizes them with a persistence anchor, and applies bounded local stochastic transport while preserving the simplex by construction.
-
StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow
StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.
-
Diversified Residual Symbolic Regression
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
-
NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
-
From Table to Cell: Attention for Better Reasoning with TABALIGN
TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.
-
Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings
Image meanings grow more context-dependent with semantic abstraction, requiring narrative grounding for accurate retrieval at higher levels.
-
IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback
IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.
-
Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization
DIPS fine-tunes LLMs to output ordered feasible decision vectors approximating Pareto fronts for constrained bi-objective convex problems, reaching 95-98% normalized hypervolume with 0.16s inference.
-
Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities
Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.
-
Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension
Spiking attention is a universal approximator of permutation-equivariant functions with ε-approximation requiring Ω(L_f² nd / ε²) spikes, but low effective dimensions (47-89) allow T=4 timesteps in practice.
-
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
-
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
-
Incident-Guided Spatiotemporal Traffic Forecasting
IGSTGNN adds incident-context spatial fusion and temporal impact decay modules to model how events alter traffic patterns, achieving state-of-the-art results on a new time-aligned incident-traffic dataset.
-
Variational Sequential Optimal Experimental Design using Reinforcement Learning
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
-
LocalNav: Distilling Frontier VLMs and Embodied RL for On-Device Object Goal Navigation
Distillation from frontier VLMs plus E-RLVR regularization produces a 4B local model that achieves 34.5% SR on OVON while cutting inference latency by 82.8%.
-
Differential Zonotopes for Verifying Global Robustness of DNNs
Differential halo zonotopes enable static verification of global robustness in DNNs by jointly propagating pairs of perturbed inputs while bounding divergence, with a relaxed confidence-based variant.
-
MoCo-AIS: A Contrastive Learning Framework for Similarity Computation of Vessel Trajectories
MoCo-AIS is a MoCo-based contrastive learning framework that learns vessel trajectory embeddings and improves similarity computation over baselines on large-scale real-world AIS datasets while offering a benchmarking platform.
-
CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts
CausalMoE is a multimodal foundation model with pattern-routed heterogeneous experts and LLM/VLM integration that claims new SOTA performance on supervised and few-shot Granger causal discovery benchmarks.
-
Scene-Adaptive Nonlinear Tone Curves for Pseudo Ground-Truth Generation in Low-Light 3D Gaussian Splatting
Scene-adaptive nonlinear tone curves (ASE and AP3) with percentile normalisation and offset outperform linear gain for pseudo-GT generation in low-light 3DGS, delivering PSNR gains up to 4.34 dB on LOM and 3.25 dB on RealX3D across 21 scenes.
-
Stability in Competitive Search with Results Diversification
Game-theoretic analysis of diversification in competitive search reveals a diversity-stability tradeoff, with a new method to guarantee corpus equilibrium.
-
Temporal-Aware Reasoning Optimization for Video Temporal Grounding
TaRO improves video temporal grounding in MLLMs via constructive reasoning exploration from dense captions and a temporal-sensitivity reward that uses logit drops on disrupted event boundaries, followed by curriculum learning to SOTA results.
-
SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows
SKILL.nb uses selective formalization and gate-conditioned execution in auditable notebooks to improve durability of agent workflows, achieving 53.7% success on WebArena-Verified with 91.7% retention across re-executions.
-
Shield-Loco: Shielding Locomotion Policies with Predictive Safety Filtering
A post-hoc predictive safety filter adjusts RL policy contact locations for quadruped robots via sampling-based optimization on a full-physics model, reducing safety violations in cluttered environments with minimal performance deviation.
-
POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems
POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
-
LLM-Driven Co-Evolutionary Automated Heuristic Design for Bi-Component Coupled Combinatorial Optimization
CoEvo-AHD is an LLM-driven dual-population co-evolutionary method for automated heuristic design in bi-component coupled combinatorial optimization that achieves competitive results on TTP and TPP.
-
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
-
No One Knows the State of the Art in Geospatial Foundation Models
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
-
CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators
CauSim turns scarce causal reasoning labels into scalable supervised data by having LLMs incrementally construct complex executable structural causal models.
-
NICE FACT: Diagnosing and Calibrating VLMs in Quantitative Reasoning for Kinematic Physics
VLMs fail to identify visual preconditions or apply physical laws in kinematic physics tasks, as shown by new FACT diagnostics and NICE calibration methods evaluated on six state-of-the-art models.
-
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
-
Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation
A large-scale standardized benchmark of GNN attacks and defenses reveals that target node selection and attacked-model training process can completely distort measured attack effectiveness.
-
Identifier-Free Code Embedding Models for Scalable Search
A fine-tuned Qwen3-Embedding model with contrastive learning outperforms baselines on bidirectional source-to-decompiled code association and generalizes to constant-algorithm tasks.
-
SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-On
SIFT-VTON adds explicit geometric supervision from SIFT keypoints to diffusion-based virtual try-on to improve spatial alignment and detail preservation.
-
Tail allocation for conformal prediction intervals
TA-CQR adaptively allocates miscoverage tails to produce shortest single-interval conformal prediction sets with exact marginal coverage and provides oracle inequalities for length.
-
Deep Image Clustering Based on Curriculum Learning and Density Information
IDCL adds density-based curriculum learning and density-core guidance to deep image clustering, claiming superior robustness, faster convergence, and flexibility on benchmark datasets.
-
Supervised Mixture-of-Experts for Surgical Grasping and Retraction
Supervised MoE on top of ACT achieves higher success in bowel grasping/retraction from <150 demos than standard ACT or generalist VLAs, with OOD robustness, unseen viewpoint generalization, and zero-shot ex vivo porcine transfer.
-
Semantic-aware Random Convolution and Source Matching for Domain Generalization in Medical Image Segmentation
Semantic-aware random convolution and intensity-based source matching enable effective single-source domain generalization for medical image segmentation, outperforming prior methods and sometimes matching in-domain performance.
-
TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models
TEMPO-Diffusion is a targeted backdoor attack framework for diffusion models that uses time-conditioned triggers to poison class-specific synthetic data, achieving high attack success in downstream classifiers.
-
Compositional Behavioral Semantics for State Abstraction in Reinforcement Learning
A category-theoretic compositional framework for behavioral semantics in RL that supports safe transfer of structures under state abstraction and sound quantitative metrics.
-
DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data
DiffUNet^2 is a bidirectional conditional diffusion model integrated with visual tools for probabilistic exploration of scientific time series across five evaluated datasets.
-
FlowGuard: Flow Matching for Identity-Independent Detection of Data-Free Model Stealing Attacks on Energy System Intrusion Detection Systems
FlowGuard applies continuous normalizing flows to flag out-of-distribution synthetic queries from model stealing attacks on IDS, achieving stable detection in single-client and 100-client Sybil settings unlike identity-dependent baselines.
-
Memorization Dynamics of Fill-in-the-Middle Pretraining
FIM pretraining yields linear growth of verbatim extraction with data repetitions and stronger prefix dependence for recall than left-to-right training in matched Llama models.
-
Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair
Reshaping outcome rewards, process signals, and rollout comparability in GRPO raises strict compile-and-semantic accuracy in agentic code repair from 0.385 to 0.535 under weak feedback.
-
Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers
LipB-ViT adds bi-Lipschitz Bayesian layers to vision transformers and uses uncertainty-aware fusion to identify corrupted labels with over 93% recall at 15% noise, beating kNN baselines.
-
Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering
LLM judges for code tasks show high sensitivity to prompt biases that systematically favor certain options, changing accuracy and model rankings even when code is unchanged.
-
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.
-
Weak-to-Strong Knowledge Distillation Accelerates Visual Learning
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.