super hub Canonical reference

Emogen: Emotional image content generation with text-to-image diffusion models

· 2024 · arXiv 2733.2024

Canonical reference. 91% of citing Pith papers cite this work as background.

239 Pith papers citing it

Background 91% of classified citations

read on arXiv browse 239 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 83 dataset 6 baseline 2 method 2

citation-polarity summary

background 85 use dataset 4 baseline 2 use method 2

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

ScaLe-INR: Scale and Learn Implicit Neural Representations

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0 · 4 refs

GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.

Leveraging target dynamics for imaging in complex media

physics.optics · 2026-06-21 · unverdicted · novelty 7.0

Target dynamics provide an intrinsic source of variation equivalent to controlled illumination changes, enabling scattering-compensated reconstruction of dynamic scenes with one acquisition per frame in holographic and fluorescence imaging.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

FLM-Occ: Feed-forward Likelihood Maximization for Efficient Indoor Occupancy Prediction

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.

HERO: Hypothesis-Driven Evidence Retrieval from Omics for Multi-Task Breast Cancer Analysis

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

cs.CL · 2026-06-18 · unverdicted · novelty 7.0

StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.

Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

cs.CV · 2026-06-16 · conditional · novelty 7.0

CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0 · 2 refs

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0 · 2 refs

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design

cs.CE · 2026-06-04 · unverdicted · novelty 7.0 · 3 refs

Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

cs.CV · 2026-06-04 · conditional · novelty 7.0

Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.

Multimarginal flow matching with optimal transport potentials

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.

TIDES: Time-Derivative Event Simulation via Deformable Reconstruction

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

TIDES simulates realistic event camera streams in continuous time via dynamic Gaussian splatting with adaptive occlusion handling and sensor artifact modeling, claiming SOTA fidelity and better downstream transfer than prior methods.

Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

SuperMemory-VQA provides 4,853 human-verified QA pairs from 52.9 hours of egocentric AI glasses recordings to benchmark AI systems on realistic long-horizon memory tasks including an unanswerable option.

From Noise to Control: Parameterized Diffusion Policies

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

Parameterized Diffusion Policy learns a behavior manifold to condition diffusion policies on low-dimensional continuous parameters, enabling interpolation between strategies and adaptation to novel constraints without policy weight updates.

RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations

cs.CV · 2026-05-22 · unverdicted · novelty 7.0 · 2 refs

RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.

AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding

cs.CV · 2026-05-21 · accept · novelty 7.0 · 5 refs

AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging

hep-ex · 2026-05-20 · unverdicted · novelty 7.0

PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.

citing papers explorer

Showing 50 of 239 citing papers.

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife cs.CV · 2026-06-19 · unverdicted · none · ref 37
WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.
ScaLe-INR: Scale and Learn Implicit Neural Representations cs.CV · 2026-06-26 · unverdicted · none · ref 20
ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.
MATCH: Flow Matching for Multi-View Anomaly Detection cs.CV · 2026-06-23 · unverdicted · none · ref 9 · 2 links
MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.
GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation cs.CV · 2026-06-22 · unverdicted · none · ref 6 · 4 links
GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation cs.CV · 2026-06-22 · unverdicted · none · ref 6
Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.
Leveraging target dynamics for imaging in complex media physics.optics · 2026-06-21 · unverdicted · none · ref 22
Target dynamics provide an intrinsic source of variation equivalent to controlled illumination changes, enabling scattering-compensated reconstruction of dynamic scenes with one acquisition per frame in holographic and fluorescence imaging.
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking cs.CV · 2026-06-21 · conditional · none · ref 14 · 2 links
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
FLM-Occ: Feed-forward Likelihood Maximization for Efficient Indoor Occupancy Prediction cs.CV · 2026-06-19 · unverdicted · none · ref 9
FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.
HERO: Hypothesis-Driven Evidence Retrieval from Omics for Multi-Task Breast Cancer Analysis cs.CV · 2026-06-19 · unverdicted · none · ref 12
HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.
StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs cs.CL · 2026-06-18 · unverdicted · none · ref 8
StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset cs.CV · 2026-06-16 · conditional · none · ref 54
CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.
TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation cs.CV · 2026-06-10 · unverdicted · none · ref 23 · 2 links
A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning cs.CV · 2026-06-08 · unverdicted · none · ref 12
FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.
Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation cs.CV · 2026-06-05 · unverdicted · none · ref 30 · 2 links
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design cs.CE · 2026-06-04 · unverdicted · none · ref 26 · 3 links
Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.
Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents cs.CV · 2026-06-04 · conditional · none · ref 9
Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.
Multimarginal flow matching with optimal transport potentials cs.LG · 2026-06-03 · unverdicted · none · ref 40
OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.
TIDES: Time-Derivative Event Simulation via Deformable Reconstruction cs.CV · 2026-06-01 · unverdicted · none · ref 37
TIDES simulates realistic event camera streams in continuous time via dynamic Gaussian splatting with adaptive occlusion handling and sensor artifact modeling, claiming SOTA fidelity and better downstream transfer than prior methods.
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging cs.LG · 2026-06-01 · unverdicted · none · ref 10
MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.
SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory cs.CV · 2026-05-30 · unverdicted · none · ref 48
SuperMemory-VQA provides 4,853 human-verified QA pairs from 52.9 hours of egocentric AI glasses recordings to benchmark AI systems on realistic long-horizon memory tasks including an unanswerable option.
From Noise to Control: Parameterized Diffusion Policies cs.AI · 2026-05-29 · unverdicted · none · ref 40
Parameterized Diffusion Policy learns a behavior manifold to condition diffusion policies on low-dimensional continuous parameters, enabling interpolation between strategies and adaptation to novel constraints without policy weight updates.
RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations cs.CV · 2026-05-22 · unverdicted · none · ref 13 · 2 links
RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding cs.CV · 2026-05-21 · accept · none · ref 22 · 5 links
AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.
Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging hep-ex · 2026-05-20 · unverdicted · none · ref 20
PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.
Interaction Locality in Hierarchical Recursive Reasoning cs.AI · 2026-05-20 · unverdicted · none · ref 1 · 2 links
Interaction locality is introduced as a task-geometry-aware measurement framework showing that high-level states in recursive models write locally while recursive updates build broader structures on maze, Sudoku, ARC-AGI, and 3D grounding tasks.
SDM: A Powerful Tool for Evaluating Model Robustness cs.CV · 2026-05-19 · unverdicted · none · ref 16
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue cs.CV · 2026-05-19 · unverdicted · none · ref 6
LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.
CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts cs.GR · 2026-05-18 · unverdicted · none · ref 54
CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 127
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow stat.ML · 2026-05-15 · unverdicted · none · ref 11
StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.
Navigating Potholes with Geometry-Aware Sharpness Minimization cs.LG · 2026-05-15 · unverdicted · none · ref 10
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion cs.LG · 2026-05-13 · unverdicted · none · ref 16
MuteBench evaluates multimodal fusion robustness to modality missing and within-modality missing on 125000 samples from 9 clinical datasets, finding architecture family predicts tolerance better than parameter count.
Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 5 · 2 links
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization cs.CV · 2026-05-12 · unverdicted · none · ref 3 · 5 links
Vector Scaffolding uses Interior Gradient Aggregation, Progressive Stratification, and Rapid Inflation Scheduling to achieve 2.5x faster optimization and up to 1.4 dB higher PSNR in differentiable vectorization.
Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking cs.CL · 2026-05-11 · unverdicted · none · ref 24 · 2 links
BICR trains a lightweight probe on contrastive hidden states from real versus blind images to detect visual grounding in LVLM predictions, outperforming baselines on calibration and discrimination with fewer parameters.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image cs.LG · 2026-05-11 · unverdicted · none · ref 28
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation cs.GR · 2026-05-11 · unverdicted · none · ref 46
GRCA uses emitter-centric geometric culling of rays per triangle to accelerate LiDAR simulation in arbitrarily dynamic scenes, reporting up to 14.55x speedup over Embree and 7.97x over OptiX.
Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs cs.CL · 2026-05-10 · unverdicted · none · ref 15 · 2 links
LMMs perceive videos but underexploit visual content for causal reasoning due to textual shortcuts; ProCauEval diagnoses this and ADPO training reduces reliance on priors.
Field-Localized Forgery Detection for Digital Identity Documents cs.CV · 2026-05-09 · unverdicted · none · ref 15 · 2 links
FLiD is a field-localized forgery detection method for identity documents that outperforms full-document baselines and general detectors with significantly fewer parameters.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 40
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics cs.CV · 2026-05-05 · unverdicted · none · ref 35
AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting cs.CV · 2026-05-04 · unverdicted · none · ref 19 · 2 links
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models cs.CV · 2026-05-02 · unverdicted · none · ref 10 · 4 links
MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.
CSGuard: Toward Forgery-Resistant Watermarking in Diffusion Models via Compressed Sensing Constraint cs.CV · 2026-05-02 · unverdicted · none · ref 3 · 4 links
CSGuard binds diffusion-model watermarks to a secret matrix via compressed sensing, cutting forgery attack success from 100% to 28.12% while preserving 100% detection on legitimate images.
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue cs.RO · 2026-05-02 · unverdicted · none · ref 9
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting cs.GR · 2026-04-30 · unverdicted · none · ref 100
FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.
Towards Temporal Compositional Reasoning in Long-Form Sports Videos cs.CV · 2026-04-24 · unverdicted · none · ref 42
SportsTime benchmark and CoTR method improve multimodal AI's temporal compositional reasoning and evidence grounding in long-form sports videos.
Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts cs.CL · 2026-04-23 · unverdicted · none · ref 23
PolyChartQA is a new mid-scale dataset for multi-chart question answering that reveals a 27.4% accuracy drop for multimodal models on human-authored questions compared to AI-generated ones, plus a modest gain from a proposed prompting method.
HumanScore: Benchmarking Human Motions in Generated Videos cs.CV · 2026-04-22 · unverdicted · none · ref 23
HumanScore defines six metrics for kinematic plausibility, temporal stability, and biomechanical consistency to benchmark human motions in videos from thirteen state-of-the-art generation models, revealing gaps between visual appeal and physical fidelity.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 27 · 3 links
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.

Emogen: Emotional image content generation with text-to-image diffusion models

hub tools

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer