AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.
hub Canonical reference
Bovik, Hamid R
Canonical reference. 75% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
On the public ReMIND dataset, a systematic benchmark of six synthesis models across 48 experiments finds LPIPS correlates with downstream segmentation utility while SSIM does not, with SynDiff-2.5D performing best.
PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.
GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.
A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.
MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
GeRM learns a distribution transfer vector field via a multi-condition ControlNet to convert physically-based renders into photorealistic images using text prompts and a 50K expert-curated dataset.
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
A sensor-specific calibration pipeline using dark frames produces synthesized noisy RAW images that close 54-64% of the PSNR gap to real noise versus manufacturer profiles, accompanied by the open SNIC dataset of over 6600 paired images.
DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.
Harder classification tasks produce neural representations whose accuracy collapses under binarization and shuffling while easier tasks remain robust, defining task complexity via the performance gap between full-precision and perturbed networks.
PhotIQA is a new public dataset of 1134 expert-rated photoacoustic images for benchmarking image quality assessment in medical imaging.
Presents SLAM&Render, a robot-recorded benchmark dataset with 40 multi-modal sequences for testing SLAM, novel view synthesis, and Gaussian Splatting under controlled variations in lighting, arrangements, and occlusions.
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
A PINN framework with separate networks for conductivity and potentials, multiscale wavelet excitations, and FFE recovers dominant conductivity structures from finite DtN data with 3-12% relative error on synthetic tests, with FFE aiding sharp features.
Differential Unfolding replaces uniform stacking in deep unfolding networks with a heterogeneous structure of anchoring and differential evolution stages to achieve better accuracy-efficiency trade-offs in video SCI reconstruction.
Introduces Visibility-Aware Densification with Temporally-Adaptive Thresholding and Temporal Offset Warping to improve dynamic region quality in 3D Gaussian Splatting on three benchmarks.
Scene-adaptive nonlinear tone curves (ASE and AP3) with percentile normalisation and offset outperform linear gain for pseudo-GT generation in low-light 3DGS, delivering PSNR gains up to 4.34 dB on LOM and 3.25 dB on RealX3D across 21 scenes.
A plug-and-play perceptual wrapper using common random noise and Wasserstein Distortion supervision improves texture quality and reduces model size in 3D Gaussian Splatting.
High magnetic fields directly enhance the amplitude and correlation length of stripe order in a cuprate superconductor far above the vortex melting transition, indicating a coupling mechanism independent of superconductivity suppression.
LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.
MSIQ is a scale-invariant, model-free quality metric for single image super-resolution using normalized central geometric moments for direct comparison of different-resolution images.
MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.
citing papers explorer
-
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models
AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.
-
A Systematic Benchmark of Intraoperative Ultrasound-to-MR Synthesis for Brain Tumour Surgery
On the public ReMIND dataset, a systematic benchmark of six synthesis models across 48 experiments finds LPIPS correlates with downstream segmentation utility while SSIM does not, with SynDiff-2.5D performing best.
-
PanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting
PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.
-
GuardMarkGS: Unified Ownership Tracing and Edit Deterrence for 3D Gaussian Splatting
GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.
-
SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery
A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.
-
MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures
MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
-
GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic
GeRM learns a distribution transfer vector field via a multi-condition ControlNet to convert physically-based renders into photorealistic images using text prompts and a 50K expert-curated dataset.
-
LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
-
SNIC: Synthesized Noisy Images using Calibration
A sensor-specific calibration pipeline using dark frames produces synthesized noisy RAW images that close 54-64% of the PSNR gap to real noise versus manufacturer profiles, accompanied by the open SNIC dataset of over 6600 paired images.
-
Delta Rectified Flow Sampling for Text-to-Image Editing
DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.
-
Task complexity shapes internal representations and robustness in neural networks
Harder classification tasks produce neural representations whose accuracy collapses under binarization and shuffling while easier tasks remain robust, defining task complexity via the performance gap between full-precision and perturbed networks.
-
PhotIQA: A photoacoustic image data set with image quality ratings
PhotIQA is a new public dataset of 1134 expert-rated photoacoustic images for benchmarking image quality assessment in medical imaging.
-
SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM
Presents SLAM&Render, a robot-recorded benchmark dataset with 40 multi-modal sequences for testing SLAM, novel view synthesis, and Gaussian Splatting under controlled variations in lighting, arrangements, and occlusions.
-
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
-
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
-
Recovering Sharp Conductivity Features in the Finite-Data Calder\'on Problem with Physics-Informed Neural Networks
A PINN framework with separate networks for conductivity and potentials, multiscale wavelet excitations, and FFE recovers dominant conductivity structures from finite DtN data with 3-12% relative error on synthetic tests, with FFE aiding sharp features.
-
Differential Unfolding: Efficient Unfolding Reconstruction for Video Snapshot Compressive Imaging
Differential Unfolding replaces uniform stacking in deep unfolding networks with a heterogeneous structure of anchoring and differential evolution stages to achieve better accuracy-efficiency trade-offs in video SCI reconstruction.
-
Temporally Aware Densification for Dynamic 3D Gaussian Splatting
Introduces Visibility-Aware Densification with Temporally-Adaptive Thresholding and Temporal Offset Warping to improve dynamic region quality in 3D Gaussian Splatting on three benchmarks.
-
Scene-Adaptive Nonlinear Tone Curves for Pseudo Ground-Truth Generation in Low-Light 3D Gaussian Splatting
Scene-adaptive nonlinear tone curves (ASE and AP3) with percentile normalisation and offset outperform linear gain for pseudo-GT generation in low-light 3DGS, delivering PSNR gains up to 4.34 dB on LOM and 3.25 dB on RealX3D across 21 scenes.
-
Seeing What Matters: Perceptual Wrapper with Common Randomness for 3D Gaussian Splatting
A plug-and-play perceptual wrapper using common random noise and Wasserstein Distortion supervision improves texture quality and reduces model size in 3D Gaussian Splatting.
-
Direct High-Magnetic-Field Coupling to Stripe Order in a Cuprate Superconductor
High magnetic fields directly enhance the amplitude and correlation length of stripe order in a cuprate superconductor far above the vortex melting transition, indicating a coupling mechanism independent of superconductivity suppression.
-
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators
LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.
-
MSIQ: Moment-based Scale-Invariant Quality Measure for Single Image Super-Resolution
MSIQ is a scale-invariant, model-free quality metric for single image super-resolution using normalized central geometric moments for direct comparison of different-resolution images.
-
MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery
MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.
-
Learning Developmental Scaffoldings to Guide Self-Organisation
Joint training of NCA rules and SIREN pre-patterns improves robustness, encoding capacity, and symmetry breaking compared to purely self-organizing models by offloading information to initial conditions.
-
LiBrA-Net: Lie-Algebraic Bilateral Affine Fields for Real-Time 4K Video Dehazing
LiBrA-Net achieves real-time native 4K video dehazing via Lie-algebraic bilateral affine fields and releases the first 4K paired dehazing video benchmark with per-frame annotations.
-
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
-
AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes
AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.
-
SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-On
SIFT-VTON adds explicit geometric supervision from SIFT keypoints to diffusion-based virtual try-on to improve spatial alignment and detail preservation.
-
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
-
Defining Robust Ultrasound Quality Metrics via an Ultrasound Foundation Model
Proposes TinyUSFM-uLPIPS and TinyUSFM-NRQ metrics that show better alignment with segmentation task performance and expert preference than PSNR or VGG-LPIPS in ultrasound imaging.
-
CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans
CAHAL introduces a physics-informed mixture-of-experts super-resolution network for clinical MRI that conditions on resolution and anisotropy and uses edge-penalised, Fourier, and segmentation-guided losses to reduce hallucinations compared with prior generative methods.
-
CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography
CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.
-
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts
GIF fuses geometrical image features and logical graph topology in a conditional diffusion model to generate high-quality IR drop images for chip layouts, outperforming prior ML methods on CircuitNet-N28 with SSIM 0.78, Pearson 0.95, PSNR 21.77, and NMAE 0.026.
-
PSIRNet: Deep Learning-based Free-breathing Rapid Acquisition Late Enhancement Imaging
PSIRNet produces diagnostic-quality free-breathing PSIR LGE cardiac MRI from a single interleaved IR/PD acquisition over two heartbeats using a physics-guided deep learning network trained on over 800,000 slices.
-
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
-
Seeing enough: non-reference perceptual resolution selection for power-efficient client-side rendering
A neural network trained on full-reference perceptual quality labels predicts minimal sufficient resolution for rendered video to enable power-efficient client-side rendering.
-
Single-Stage Signal Attenuation Diffusion Model for Low-Light Image Enhancement and Denoising
SADM adds a signal attenuation coefficient to the diffusion forward process so that reverse denoising simultaneously recovers brightness and suppresses noise without extra stages or correction modules.
-
Benchmarking quantum simulation with neutron-scattering experiments
A 50-qubit quantum processor produces dynamical structure factors for KCuF3 that quantitatively match neutron-scattering measurements of its spinon spectrum.
-
Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising
ASTERIS, a self-supervised spatiotemporal denoising algorithm, improves astronomical detection limits by 1 magnitude at 90% completeness while identifying three times more redshift >9 galaxy candidates in JWST images.
-
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
-
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.
-
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
-
SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks
The paper releases SignNet-1M, a 1M-scale augmented dataset for ASL, CSL and DGS with 3DGS and diffusion-based variations, plus benchmarks showing improved cross-shift generalization.
-
UI-LIC: A Unified Framework for Evaluating Learned Image Compression Models
UI-LIC is an open-source software framework that unifies training, inference, and evaluation of learned image compression models with traditional encoders via shared configuration and a GUI for metrics and subjective analysis.
-
Deep Slice Interpolation for Reducing Through-Plane Anisotropy and Noise in Head CT
Deep learning system synthesizes intermediate head CT slices to halve through-plane anisotropy while providing implicit denoising, outperforming baselines on structural metrics.
-
StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes
StereoGenBench is a new synthetic benchmark dataset featuring calibrated multi-baseline stereo pairs with dense metric depth, intrinsics, and poses from Unreal Engine renders for controlled evaluation of stereo generation.