GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.
hub
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer
27 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 27representative citing papers
A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.
MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
LiBrA-Net achieves real-time native 4K video dehazing via Lie-algebraic bilateral affine fields and releases the first 4K paired dehazing video benchmark with per-frame annotations.
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.
SIFT-VTON adds explicit geometric supervision from SIFT keypoints to diffusion-based virtual try-on to improve spatial alignment and detail preservation.
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
TinyUSFM-uLPIPS and TinyUSFM-NRQ provide task-linked, cross-organ, and clinically predictive quality assessment for ultrasound images that outperforms conventional metrics in calibration with segmentation performance and sonographer preference.
CAHAL introduces a physics-informed mixture-of-experts super-resolution network for clinical MRI that conditions on resolution and anisotropy and uses edge-penalised, Fourier, and segmentation-guided losses to reduce hallucinations compared with prior generative methods.
CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
GIF fuses geometrical image features and logical graph topology in a conditional diffusion model to generate high-quality IR drop images for chip layouts, outperforming prior ML methods on CircuitNet-N28 with SSIM 0.78, Pearson 0.95, PSNR 21.77, and NMAE 0.026.
PSIRNet produces diagnostic-quality free-breathing PSIR LGE cardiac MRI from a single interleaved IR/PD acquisition over two heartbeats using a physics-guided deep learning network trained on over 800,000 slices.
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
A neural network trained on full-reference perceptual quality labels predicts minimal sufficient resolution for rendered video to enable power-efficient client-side rendering.
SADM adds a signal attenuation coefficient to the diffusion forward process so that reverse denoising simultaneously recovers brightness and suppresses noise without extra stages or correction modules.
UMEDA is a new graph federated learning method that uses low-rank spectral filtering and diffusion over a shared integral operator to fuse multi-modal data privately, outperforming baselines on MM-Fi and RELI11D under high heterogeneity and tight privacy budgets.
Physics-guided U-Net removes non-stationary artifacts from X-ray images, raising mean SSIM from 0.345 to 0.906 and 0.0679 to 0.945 in synthetic tests while preserving filament profiles better than Fourier filtering or DFFN.
Flow matching achieves single-step pixel accuracy and 20-step perceptual quality for Sentinel-2 super-resolution, outperforming diffusion and Real-ESRGAN while enabling large-scale 2.5 m land-cover products.
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
citing papers explorer
-
GuardMarkGS: Unified Ownership Tracing and Edit Deterrence for 3D Gaussian Splatting
GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.
-
SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery
A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.
-
MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures
MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
-
LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
-
LiBrA-Net: Lie-Algebraic Bilateral Affine Fields for Real-Time 4K Video Dehazing
LiBrA-Net achieves real-time native 4K video dehazing via Lie-algebraic bilateral affine fields and releases the first 4K paired dehazing video benchmark with per-frame annotations.
-
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
-
AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes
AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.
-
SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-On
SIFT-VTON adds explicit geometric supervision from SIFT keypoints to diffusion-based virtual try-on to improve spatial alignment and detail preservation.
-
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
-
Defining Robust Ultrasound Quality Metrics via an Ultrasound Foundation Model
TinyUSFM-uLPIPS and TinyUSFM-NRQ provide task-linked, cross-organ, and clinically predictive quality assessment for ultrasound images that outperforms conventional metrics in calibration with segmentation performance and sonographer preference.
-
CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans
CAHAL introduces a physics-informed mixture-of-experts super-resolution network for clinical MRI that conditions on resolution and anisotropy and uses edge-penalised, Fourier, and segmentation-guided losses to reduce hallucinations compared with prior generative methods.
-
CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography
CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.
-
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts
GIF fuses geometrical image features and logical graph topology in a conditional diffusion model to generate high-quality IR drop images for chip layouts, outperforming prior ML methods on CircuitNet-N28 with SSIM 0.78, Pearson 0.95, PSNR 21.77, and NMAE 0.026.
-
PSIRNet: Deep Learning-based Free-breathing Rapid Acquisition Late Enhancement Imaging
PSIRNet produces diagnostic-quality free-breathing PSIR LGE cardiac MRI from a single interleaved IR/PD acquisition over two heartbeats using a physics-guided deep learning network trained on over 800,000 slices.
-
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
-
Seeing enough: non-reference perceptual resolution selection for power-efficient client-side rendering
A neural network trained on full-reference perceptual quality labels predicts minimal sufficient resolution for rendered video to enable power-efficient client-side rendering.
-
Single-Stage Signal Attenuation Diffusion Model for Low-Light Image Enhancement and Denoising
SADM adds a signal attenuation coefficient to the diffusion forward process so that reverse denoising simultaneously recovers brightness and suppresses noise without extra stages or correction modules.
-
UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment
UMEDA is a new graph federated learning method that uses low-rank spectral filtering and diffusion over a shared integral operator to fuse multi-modal data privately, outperforming baselines on MM-Fi and RELI11D under high heterogeneity and tight privacy budgets.
-
Physics-Guided Deep Learning For High Resolution X-ray Imaging
Physics-guided U-Net removes non-stationary artifacts from X-ray images, raising mean SSIM from 0.345 to 0.906 and 0.0679 to 0.945 in synthetic tests while preserving filament profiles better than Fourier filtering or DFFN.
-
Flow matching for Sentinel-2 super-resolution: implementation, application, and implications
Flow matching achieves single-step pixel accuracy and 20-step perceptual quality for Sentinel-2 super-resolution, outperforming diffusion and Real-ESRGAN while enabling large-scale 2.5 m land-cover products.
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
MSDS: Deep Structural Similarity with Multiscale Representation
MSDS computes DeepSSIM at multiple pyramid scales and fuses the scores with learned weights, producing consistent improvements over single-scale DeepSSIM on IQA benchmarks with negligible extra cost.
-
A GPU-enhanced workflow for non-Fourier SENSE reconstruction
A public GPU workflow for non-Fourier SENSE MRI reconstruction with sensitivity and off-resonance mapping enables fast, accurate imaging from challenging spiral trajectories.
- GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic