SPoILeR uses multimodal pre-training to enable accurate novel view synthesis of infrared, polarimetric, and multispectral data from RGB-supervised fine-tuning on new scenes.
hub
Schmon, and Chris G
20 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 20roles
background 2polarities
background 2representative citing papers
AbsoluteDegradation supplies a physics-inspired synthetic degradation pipeline and a large real-world archival benchmark to train and evaluate film restoration models.
Q-Margin encodes margin penalties into the reference measure of an alpha-divergence loss to produce sparse discriminative embeddings for face and speaker verification.
MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.
PGU-Net is a deep unfolding network for blind cross-sensor spectral super-resolution that jointly reconstructs the HSI and learns the spectral transformation function via alternating optimization stages.
Introduces AVTrack dataset for audio-visual tracking in challenging human-centric scenes, demonstrating performance drops in existing methods.
LoRA adapters fix collapsed visual CLS token attention in CLIP for superior cross-domain few-shot learning, and the new Semantic Probe framework revives prompt methods to reach state-of-the-art on four benchmarks.
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
Task conditioning suppresses safety-critical signal reporting in language and vision models that unconstrained versions report at higher rates, creating an inattentional gap that decouples benchmark safety from real-world safety.
QUEST measures uncertainty via the Lebesgue volume of highest-density regions of a distribution's support, evaluated at robustness parameter alpha, and claims to satisfy UQ axioms while outperforming variance and differential entropy on selective prediction tasks.
MSIQ is a scale-invariant, model-free quality metric for single image super-resolution using normalized central geometric moments for direct comparison of different-resolution images.
SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accuracy on vision benchmarks.
Holi-DETR improves fashion item detection by integrating co-occurrence probabilities, inter-item spatial arrangements, and body keypoint relationships into the DETR architecture.
Counterfactual baselines for Integrated Gradients yield more faithful and medically relevant attributions than standard baselines across three medical datasets.
Diffusion models via DDPM work for anomaly detection but are slow; the proposed DTE method estimates diffusion time distribution analytically and with a neural net to deliver faster inference while outperforming DDPM on ADBench for unsupervised and semi-supervised settings.
Synthetic historical maps are generated from modern vector data via style transfer and uncertainty emulation to train segmentation models for historical map corpora.
Token compression in ViT segmentation degrades sharply at high ratios due to information loss while structural pruning degrades smoothly; a moderate prune-then-merge pipeline improves the trade-off on ADE20K and Cityscapes under corruption.
Combines visual prompting, dual-teacher supervision, and diffusion augmentation on an MMR backbone to gain 3.5 percentage points on the AeBAD anomaly detection dataset.
Machine learning on vehicle signals enables binary road surface classification into grip or slip conditions during cruising.
A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.
citing papers explorer
-
Learning Spectral and Polarimetric Clues for One-to-Multimodal Novel View Synthesis
SPoILeR uses multimodal pre-training to enable accurate novel view synthesis of infrared, polarimetric, and multispectral data from RGB-supervised fine-tuning on new scenes.
-
AbsoluteDegradation: A Physics-Inspired Synthetic Film-Degradation Pipeline and Archival Film Restoration Benchmark
AbsoluteDegradation supplies a physics-inspired synthetic degradation pipeline and a large real-world archival benchmark to train and evaluate film restoration models.
-
Sparsity-Inducing Divergence Losses for Biometric Verification
Q-Margin encodes margin penalties into the reference measure of an alpha-divergence loss to produce sparse discriminative embeddings for face and speaker verification.
-
Learning to Deny: Action Denial in Multimodal Large Language Models
MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.
-
Physics-Guided Deep Unfolding for Blind Cross-Sensor Spectral Super-Resolution via Learning the Spectral Transformation Function
PGU-Net is a deep unfolding network for blind cross-sensor spectral super-resolution that jointly reconstructs the HSI and learns the spectral transformation function via alternating optimization stages.
-
AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes
Introduces AVTrack dataset for audio-visual tracking in challenging human-centric scenes, demonstrating performance drops in existing methods.
-
Reviving In-domain Fine-tuning Methods for Source-Free Cross-domain Few-shot Learning
LoRA adapters fix collapsed visual CLS token attention in CLIP for superior cross-domain few-shot learning, and the new Semantic Probe framework revives prompt methods to reach state-of-the-art on four benchmarks.
-
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
-
The Inattentional Gap: Task-Conditioned Language and Vision Models Omit the Safety-Critical Signals They Can Otherwise Report
Task conditioning suppresses safety-critical signal reporting in language and vision models that unconstrained versions report at higher rates, creating an inattentional gap that decouples benchmark safety from real-world safety.
-
On the QUEST for Uncertainty Quantification via Highest Density Regions
QUEST measures uncertainty via the Lebesgue volume of highest-density regions of a distribution's support, evaluated at robustness parameter alpha, and claims to satisfy UQ axioms while outperforming variance and differential entropy on selective prediction tasks.
-
MSIQ: Moment-based Scale-Invariant Quality Measure for Single Image Super-Resolution
MSIQ is a scale-invariant, model-free quality metric for single image super-resolution using normalized central geometric moments for direct comparison of different-resolution images.
-
Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR
SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accuracy on vision benchmarks.
-
Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information
Holi-DETR improves fashion item detection by integrating co-occurrence probabilities, inter-item spatial arrangements, and body keypoint relationships into the DETR architecture.
-
On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines
Counterfactual baselines for Integrated Gradients yield more faithful and medically relevant attributions than standard baselines across three medical datasets.
-
On Diffusion Modeling for Anomaly Detection
Diffusion models via DDPM work for anomaly detection but are slow; the proposed DTE method estimates diffusion time distribution analytically and with a neural net to deliver faster inference while outperforming DDPM on ADBench for unsupervised and semi-supervised settings.
-
Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation
Synthetic historical maps are generated from modern vector data via style transfer and uncertainty emulation to train segmentation models for historical map corpora.
-
When Token Compression Breaks: Structural Pruning vs. Token Reduction for Robust ViT Segmentation under High Compression
Token compression in ViT segmentation degrades sharply at high ratios due to information loss while structural pruning degrades smoothly; a moderate prune-then-merge pipeline improves the trade-off on ADE20K and Cityscapes under corruption.
-
Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision
Combines visual prompting, dual-teacher supervision, and diffusion augmentation on an MMR backbone to gain 3.5 percentage points on the AeBAD anomaly detection dataset.
-
Binary Road Surface Classification Using Machine Learning on Production Vehicle Signals During Cruising
Machine learning on vehicle signals enables binary road surface classification into grip or slip conditions during cruising.
-
AI Driven Soccer Analysis Using Computer Vision
A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.