AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
hub Mixed citations
Vbench: Comprehensive benchmark suite for video generative models
Mixed citation behavior. Most common role is background (62%).
hub tools
citation-role summary
citation-polarity summary
years
2026 13representative citing papers
The CURVAS-PDACVI benchmark supplies a multi-annotated PDAC dataset and shows that uncertainty-aware models yield better-calibrated maps and more robust performance than binary segmentation methods at clinically ambiguous tumor-vessel interfaces.
SOB benchmark shows LLMs achieve near-perfect schema compliance but value accuracy of only 83% on text, 67% on images, and 24% on audio.
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
SDFRaster optimizes a continuous SDF on a Delaunay tetrahedral grid, renders it by rasterizing tetrahedra, and integrates differentiable Marching Tetrahedra for end-to-end mesh reconstruction without post-processing.
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
T3S is a new semantic similarity score for processed images that decomposes semantics into foreground entities, background entities, and relations, outperforming fidelity metrics on COCO and SPA-Data.
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
ZID-Net decouples diffusion-based priors into a training-only head to create an efficient feed-forward network for single-image dehazing, reporting 40.75 dB PSNR on RESIDE and 19 ms inference.
DeepSignature embeds digitally signed content-encoding watermarks via neural networks for robust image authentication, source attribution, and latent-space tamper localization.
SkillFormer, PATS, and ProfVLM deliver state-of-the-art multi-view proficiency estimation on Ego-Exo4D with up to 20x fewer parameters by combining selective fusion, dense sampling, and generative feedback.
citing papers explorer
-
Distance Field Rasterization for End-to-End Mesh Reconstruction
SDFRaster optimizes a continuous SDF on a Delaunay tetrahedral grid, renders it by rasterizing tetrahedra, and integrates differentiable Marching Tetrahedra for end-to-end mesh reconstruction without post-processing.