super hub Canonical reference

Emogen: Emotional image content generation with text-to-image diffusion models

Edstedt, J · 2024 · arXiv 2733.2024

Canonical reference. 91% of citing Pith papers cite this work as background.

285 Pith papers citing it

Background 91% of classified citations

read on arXiv browse 285 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 84 dataset 6 baseline 2 method 2

citation-polarity summary

background 86 use dataset 4 baseline 2 use method 2

co-cited works

representative citing papers

Rolling Shutter Relative Pose Estimation Made Practical

cs.CV · 2026-06-25 · conditional · novelty 8.0

A linearized solver estimates rolling-shutter relative pose and motion from 7 affine correspondences in 1.2 ms and reports best-in-benchmark accuracy plus usable translational velocity.

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

Learning Spectral and Polarimetric Clues for One-to-Multimodal Novel View Synthesis

cs.CV · 2026-07-02 · unverdicted · novelty 7.0 · 5 refs

SPoILeR uses multimodal pre-training to enable accurate novel view synthesis of infrared, polarimetric, and multispectral data from RGB-supervised fine-tuning on new scenes.

MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0 · 3 refs

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception

cs.CV · 2026-06-30 · accept · novelty 7.0 · 2 refs

RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.

Intrinsic decomposition and editing of 3D Gaussian splats

cs.GR · 2026-06-30 · unverdicted · novelty 7.0

A method to decompose 3D Gaussian splats into independent albedo and shading components for consistent texture editing in radiance fields.

Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.

Learning to Deny: Action Denial in Multimodal Large Language Models

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.

Diffusion-Based Material Regularization for Physics-Based Inverse Rendering

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

A regularization technique that treats diffusion model outputs as a similarity kernel during material optimization in inverse rendering, enabling joint reconstruction of geometry, materials, and illumination that satisfies the rendering equation and generalizes to new lighting.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

ScaLe-INR: Scale and Learn Implicit Neural Representations

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0 · 4 refs

GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.

Leveraging target dynamics for imaging in complex media

physics.optics · 2026-06-21 · unverdicted · novelty 7.0

Target dynamics provide an intrinsic source of variation equivalent to controlled illumination changes, enabling scattering-compensated reconstruction of dynamic scenes with one acquisition per frame in holographic and fluorescence imaging.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

FLM-Occ: Feed-forward Likelihood Maximization for Efficient Indoor Occupancy Prediction

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.

HERO: Hypothesis-Driven Evidence Retrieval from Omics for Multi-Task Breast Cancer Analysis

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

cs.CL · 2026-06-18 · unverdicted · novelty 7.0

StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.

Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

cs.CV · 2026-06-16 · conditional · novelty 7.0

CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0 · 2 refs

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0 · 2 refs

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

citing papers explorer

Showing 3 of 3 citing papers after filters.

DiffPC: Diffusion-Based Projector Photometric Compensation cs.MM · 2026-06-16 · unverdicted · none · ref 8
DiffPC reformulates projector photometric compensation as a diffusion-based denoising task guided by photometry and image content to achieve better results in unseen environments.
Unveiling the Visual Counting Bottleneck in Vision-Language Models cs.MM · 2026-05-28 · unverdicted · none · ref 15
VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.
CellPrior-Net: Prior-Guided Nuclei Detection and Classification for H&E Whole-Slide Images cs.MM · 2026-07-01 · unverdicted · none · ref 38
CellPrior-Net integrates hematoxylin channel prior into a lightweight CNN for nuclei detection and classification in H&E WSIs, claiming comparable accuracy to SOTA with significantly reduced inference time across 10.4M nuclei from diverse datasets.

Emogen: Emotional image content generation with text-to-image diffusion models

hub tools

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer