MIRAGE discovers semantic attacks on online HD map construction via conditional diffusion, enabling boundary removal and injection that degrade AV performance while passing as realistic environmental changes.
Title resolution pending
28 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
The first integrated taxonomy, empirical study of interplay and shallow dememorization, plus a theoretical guarantee on dememorization depth for certified unlearning.
TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.
PACO provides a hierarchical online decision system with proxy-simulated initial thresholds and adaptive updates from mature prototypes to enable consistent category discovery in streaming sequences.
PAS-Net is a fully multiplier-free spiking neural network that enforces human joint constraints spatially and uses causal neuromodulation temporally to achieve state-of-the-art accuracy on IMU HAR with up to 98% lower dynamic energy via early-exit.
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
Medical MLLMs degrade on image classification due to four failure modes in visual representation quality, connector projection fidelity, LLM comprehension, and semantic mapping alignment, quantified by feature probing on 14 models across 3 datasets.
DynLP is a parallel dynamic batch update algorithm for label propagation that achieves significant speedups by updating only relevant parts of the graph on GPUs.
A satellite-free training framework reconstructs 3D drone scenes via Gaussian splatting, generates geometry-normalized pseudo-orthophotos, and aggregates DINOv3 features with a Fisher vector model trained only on drone data to enable cross-view retrieval.
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.
LBFTI decomposes faces into three layers with dedicated generators and a three-stage training process to invert templates into fine-grained, identity-preserving images, claiming 25.3% better TAR than prior methods.
AnchorRefine factorizes VLA action generation into a trajectory anchor for coarse planning and residual refinement for local corrections, improving success rates by up to 7.8% in simulation and 18% on real robots across LIBERO, CALVIN, and physical tasks.
A stage-wise Fourier Neural Operator surrogate predicts per-voxel adjoint gradients to accelerate 3D meta-optics inverse design, replacing expensive FDTD solves with fast inference.
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
A pair-centric set-prediction model unifies present HOI detection and multi-horizon anticipation in video by modeling future interactions as residual transitions from current pair states, backed by a temporally corrected benchmark.
GTC improves multi-modal recommendation by using user-conditional diffusion-based feature filtering and total correlation optimization, achieving up to 28.3% gains in NDCG@5 on benchmarks.
TIQA introduces datasets and a model that predict human perceptual quality of rendered text in AI images, achieving PLCC 0.942 on crops and improving selected image text quality by 0.36 MOS.
AEG baremetal framework achieves 9.2x higher compute efficiency, 3-7x less data movement, and near-zero latency variance for ResNet-18 on 28 AIE tiles versus Linux Vitis AI on 304 tiles while maintaining 68.78% ImageNet accuracy.
VeriOS-Agent is an OS agent that proactively queries humans in untrustworthy scenarios via a query-driven framework and three-stage training, achieving 19.72% higher step-wise success rate over baselines while preserving normal performance.
GenHAR generalizes cross-domain human activity recognition by 9.97% accuracy and 6.4x lower FLOPs via tokenized sensor data, frequency channel correlations, selective masking, and efficient attention, with deployment detecting 2.15 billion activities.
SAIL integrates anatomical priors at the representation level with semantic features via fusion to produce more anatomically aligned attribution maps in OCT without altering existing explainability techniques.
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
citing papers explorer
-
Systematic Discovery of Semantic Attacks in Online Map Construction through Conditional Diffusion
MIRAGE discovers semantic attacks on online HD map construction via conditional diffusion, enabling boundary removal and injection that degrade AV performance while passing as realistic environmental changes.
-
Towards Generalized Image Manipulation Localization via Score-based Model
DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
-
SoK: Unlearnability and Unlearning for Model Dememorization
The first integrated taxonomy, empirical study of interplay and shallow dememorization, plus a theoretical guarantee on dememorization depth for certified unlearning.
-
TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals
TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.
-
PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery
PACO provides a hierarchical online decision system with proxy-simulated initial thresholds and adaptive updates from mature prototypes to enable consistent category discovery in streaming sequences.
-
Towards Green Wearable Computing: A Physics-Aware Spiking Neural Network for Energy-Efficient IMU-based Human Activity Recognition
PAS-Net is a fully multiplier-free spiking neural network that enforces human joint constraints spatially and uses causal neuromodulation temporally to achieve state-of-the-art accuracy on IMU HAR with up to 98% lower dynamic energy via early-exit.
-
OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
-
Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
Medical MLLMs degrade on image classification due to four failure modes in visual representation quality, connector projection fidelity, LLM comprehension, and semantic mapping alignment, quantified by feature probing on 14 models across 3 datasets.
-
DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning
DynLP is a parallel dynamic batch update algorithm for label propagation that achieves significant speedups by updating only relevant parts of the graph on GPUs.
-
Satellite-Free Training for Drone-View Geo-Localization
A satellite-free training framework reconstructs 3D drone scenes via Gaussian splatting, generates geometry-normalized pseudo-orthophotos, and aggregates DINOv3 features with a Fisher vector model trained only on drone data to enable cross-view retrieval.
-
Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
-
OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models
OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.
-
LBFTI: Layer-Based Facial Template Inversion for Identity-Preserving Fine-Grained Face Reconstruction
LBFTI decomposes faces into three layers with dedicated generators and a three-stage training process to invert templates into fine-grained, identity-preserving images, claiming 25.3% better TAR than prior methods.
-
AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models
AnchorRefine factorizes VLA action generation into a trajectory anchor for coarse planning and residual refinement for local corrections, improving success rates by up to 7.8% in simulation and 18% on real robots across LIBERO, CALVIN, and physical tasks.
-
Neural Adjoint Method for Meta-optics: Accelerating Volumetric Inverse Design via Fourier Neural Operators
A stage-wise Fourier Neural Operator surrogate predicts per-voxel adjoint gradients to accelerate 3D meta-optics inverse design, replacing expensive FDTD solves with fast inference.
-
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
-
Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation
A pair-centric set-prediction model unifies present HOI detection and multi-horizon anticipation in video by modeling future interactions as residual transitions from current pair states, backed by a temporally corrected benchmark.
-
User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation
GTC improves multi-modal recommendation by using user-conditional diffusion-based feature filtering and total correlation optimization, achieving up to 28.3% gains in NDCG@5 on benchmarks.
-
TIQA: Human-Aligned Perceptual Text Quality Assessment in Generated Images
TIQA introduces datasets and a model that predict human perceptual quality of rendered text in AI images, achieving PLCC 0.942 on crops and improving selected image text quality by 0.36 MOS.
-
AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators
AEG baremetal framework achieves 9.2x higher compute efficiency, 3-7x less data movement, and near-zero latency variance for ResNet-18 on 28 AIE tiles versus Linux Vitis AI on 304 tiles while maintaining 68.78% ImageNet accuracy.
-
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents
VeriOS-Agent is an OS agent that proactively queries humans in untrustworthy scenarios via a query-driven framework and three-stage training, achieving 19.72% higher step-wise success rate over baselines while preserving normal performance.
-
GenHAR: Generalizing Cross-domain Human Activity Recognition for Last-mile Delivery
GenHAR generalizes cross-domain human activity recognition by 9.97% accuracy and 6.4x lower FLOPs via tokenized sensor data, frequency channel correlations, selective masking, and efficient attention, with deployment detecting 2.15 billion activities.
-
SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT
SAIL integrates anatomical priors at the representation level with semantic features via fusion to produce more anatomically aligned attribution maps in OCT without altering existing explainability techniques.
-
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
-
Observe Less, Understand More: Cost-aware Cross-scale Observation for Remote Sensing Understanding
A unified cost-aware formulation couples fine-grained high-resolution sampling decisions with cross-patch representation prediction to achieve superior performance-cost trade-offs on remote sensing recognition and retrieval tasks using a new 10M-image benchmark.
-
MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs
MATCHA optimizes DNN deployment on heterogeneous multi-accelerator edge SoCs via constraint programming for memory and scheduling plus pattern matching for parallel execution, cutting latency up to 35% versus the MATCH compiler on MLPerf Tiny.
-
WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval
WRF4CIR uses weight-regularized fine-tuning with adversarial perturbations to mitigate overfitting in composed image retrieval and narrows the generalization gap on benchmarks.
-
SatReg: Regression-based Neural Architecture Search for Lightweight Satellite Image Segmentation
SatReg uses regression surrogates on two width variables from CM-UNet students to select near-optimal lightweight segmentation architectures for edge satellite deployment without exhaustive search.