Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
Image quality assessment: from error visibility to structural similarity
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 1polarities
background 1representative citing papers
GeoSR-Bench shows that gains in traditional super-resolution metrics like PSNR and SSIM frequently do not correlate with, and can negatively correlate with, performance on downstream remote sensing tasks.
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
A self-supervised framework using SURE and equivariant constraints produces super-resolved Sentinel-5P images comparable to supervised baselines without HR references and with physically plausible structures validated against EMIT data.
A Bayesian auxiliary signal derived from noisy data enables learning-based refinement that improves statistical consistency and quality of any denoiser without clean images or precise noise distribution knowledge.
HRSino adaptively allocates diffusion inference effort across spatial regions and scales for efficient high-resolution sinogram completion without training.
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
citing papers explorer
-
Do generative video models understand physical principles?
Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
-
Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration
GeoSR-Bench shows that gains in traditional super-resolution metrics like PSNR and SSIM frequently do not correlate with, and can negatively correlate with, performance on downstream remote sensing tasks.
-
Vision Foundation Models as Generalist Tokenizers for Image Generation
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
-
Self-Supervised Super-Resolution for Sentinel-5P Hyperspectral Images
A self-supervised framework using SURE and equivariant constraints produces super-resolved Sentinel-5P images comparable to supervised baselines without HR references and with physically plausible structures validated against EMIT data.
-
Learning-based Statistical Refinement for Denoising
A Bayesian auxiliary signal derived from noisy data enables learning-based refinement that improves statistical consistency and quality of any denoiser without clean images or precise noise distribution knowledge.
-
Training-Free Inference for High-Resolution Sinogram Completion
HRSino adaptively allocates diffusion inference effort across spatial regions and scales for efficient high-resolution sinogram completion without training.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.