AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
hub
Vbench: Comprehensive benchmark suite for video generative models
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 13representative citing papers
The CURVAS-PDACVI benchmark supplies a multi-annotated PDAC dataset and shows that uncertainty-aware models yield better-calibrated maps and more robust performance than binary segmentation methods at clinically ambiguous tumor-vessel interfaces.
SOB benchmark shows LLMs achieve near-perfect schema compliance but value accuracy of only 83% on text, 67% on images, and 24% on audio.
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
SDFRaster optimizes a continuous SDF on a Delaunay tetrahedral grid, renders it by rasterizing tetrahedra, and integrates differentiable Marching Tetrahedra for end-to-end mesh reconstruction without post-processing.
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
T3S is a new semantic similarity score for processed images that decomposes semantics into foreground entities, background entities, and relations, outperforming fidelity metrics on COCO and SPA-Data.
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
ZID-Net decouples diffusion-based priors into a training-only head to create an efficient feed-forward network for single-image dehazing, reporting 40.75 dB PSNR on RESIDE and 19 ms inference.
DeepSignature embeds digitally signed content-encoding watermarks via neural networks for robust image authentication, source attribution, and latent-space tamper localization.
SkillFormer, PATS, and ProfVLM deliver state-of-the-art multi-view proficiency estimation on Ego-Exo4D with up to 20x fewer parameters by combining selective fusion, dense sampling, and generative feedback.
FreeTimeGS++ improves 4D Gaussian Splatting by using gated marginalization and neural velocity fields to achieve more stable dynamic scene representations with lower run-to-run variance.
citing papers explorer
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
-
Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark
The CURVAS-PDACVI benchmark supplies a multi-annotated PDAC dataset and shows that uncertainty-aware models yield better-calibrated maps and more robust performance than binary segmentation methods at clinically ambiguous tumor-vessel interfaces.
-
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models
SOB benchmark shows LLMs achieve near-perfect schema compliance but value accuracy of only 83% on text, 67% on images, and 24% on audio.
-
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
-
Distance Field Rasterization for End-to-End Mesh Reconstruction
SDFRaster optimizes a continuous SDF on a Delaunay tetrahedral grid, renders it by rasterizing tetrahedra, and integrates differentiable Marching Tetrahedra for end-to-end mesh reconstruction without post-processing.
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation
Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
-
Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing
T3S is a new semantic similarity score for processed images that decomposes semantics into foreground entities, background entities, and relations, outperforming fidelity metrics on COCO and SPA-Data.
-
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
-
ZID-Net: Zero-Inference Diffusion Prior Decoupling Network for Single Image Dehazing
ZID-Net decouples diffusion-based priors into a training-only head to create an efficient feed-forward network for single-image dehazing, reporting 40.75 dB PSNR on RESIDE and 19 ms inference.
-
DeepSignature: Digitally Signed, Content-Encoding Watermarks for Robust and Transparent Image Authentication
DeepSignature embeds digitally signed content-encoding watermarks via neural networks for robust image authentication, source attribution, and latent-space tamper localization.
-
Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback
SkillFormer, PATS, and ProfVLM deliver state-of-the-art multi-view proficiency estimation on Ego-Exo4D with up to 20x fewer parameters by combining selective fusion, dense sampling, and generative feedback.
-
FreeTimeGS++: Secrets of Dynamic Gaussian Splatting and Their Principles
FreeTimeGS++ improves 4D Gaussian Splatting by using gated marginalization and neural velocity fields to achieve more stable dynamic scene representations with lower run-to-run variance.