Clipscore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, Yejin Choi · 2021

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 2 dataset 1 method 1

citation-polarity summary

background 2 use dataset 1 use method 1

representative citing papers

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 8.0

CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.

Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.

STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.

LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

Do Joint Audio-Video Generation Models Understand Physics?

cs.SD · 2026-05-08 · unverdicted · novelty 7.0

AV-Phys Bench shows that current joint audio-video models lack robust physical commonsense, with major drops on transitions and deliberate anti-physics prompts.

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

SEGA adaptively scales RoPE attention components using spectral-energy guidance from the latent to improve structural coherence and fine details in high-resolution DiT synthesis.

ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.

citing papers explorer

Showing 7 of 7 citing papers.

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation cs.CV · 2026-05-07 · unverdicted · none · ref 11
CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.
Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning cs.LG · 2026-05-12 · unverdicted · none · ref 19
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models cs.CV · 2026-05-12 · unverdicted · none · ref 17
STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 10
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Do Joint Audio-Video Generation Models Understand Physics? cs.SD · 2026-05-08 · unverdicted · none · ref 17
AV-Phys Bench shows that current joint audio-video models lack robust physical commonsense, with major drops on transitions and deliberate anti-physics prompts.
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers cs.CV · 2026-05-21 · unverdicted · none · ref 43
SEGA adaptively scales RoPE attention components using spectral-energy guidance from the latent to improve structural coherence and fine details in high-resolution DiT synthesis.
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs cs.CV · 2026-05-08 · unverdicted · none · ref 42
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.

Clipscore: A reference-free evaluation metric for image captioning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer