super hub Mixed citations

Derf: Decomposed radiance fields

Guang Feng, Lihe Zhang, Zhiwei Hu · 2021 · arXiv 6437.2021

Mixed citation behavior. Most common role is background (68%).

158 Pith papers citing it

Background 68% of classified citations

read on arXiv browse 158 citing papers more from Guang Feng

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 24 dataset 5 method 5 baseline 3

citation-polarity summary

background 25 use dataset 5 baseline 3 use method 3 unclear 1

authors

and Huchuan Lu Guang Feng Lihe Zhang Zhiwei Hu

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

cs.CL · 2024-09-04 · accept · novelty 8.0

MMMU-Pro is a stricter multimodal benchmark that removes text-only solvable questions, augments options, and requires reading text from images, yielding substantially lower model scores of 16.8-26.9%.

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

cs.LG · 2022-07-05 · conditional · novelty 8.0

TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.

Efficient Compression of Structured and Unstructured Volumes via Learned 3D Gaussian Representation

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

An explicit model using learned 3D Gaussians for volume compression encodes geometry explicitly and outperforms implicit neural representations on unstructured volumes with faster training.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception

cs.CV · 2026-06-30 · accept · novelty 7.0

RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.

Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

SpikeTAD proposes the first SNN-based end-to-end TAD model, reporting 67.2% mAP on THUMOS14 and 37.42% on ActivityNet-1.3 with extremely low power consumption.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design

cs.CE · 2026-06-04 · unverdicted · novelty 7.0

Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.

Attribution via Distributional Paths for Information Revelation

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Reveal-IG performs path attribution by integrating model output changes along trajectories in a space of probe distributions rather than input-space paths, retaining completeness and handling multiscale or uncertain features.

Quality-Guided Semi-Supervised Learning for Medical Image Segmentation

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

A new quality-guided approach for semi-supervised medical image segmentation that trains a predictor on synthetic errors to enhance pseudolabel handling.

Category-Level 3D Correspondence in Camera Space via Morphable Object Priors

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Morpheus learns morphable category-level shape priors to produce implicit 3D correspondences in camera space without explicit supervision and releases the HouseCorr3D benchmark with amodal and symmetry annotations.

ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

cs.GR · 2026-05-27 · unverdicted · novelty 7.0

ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

cs.LG · 2026-05-22 · unverdicted · novelty 7.0 · 2 refs

Introduces a unified benchmark for continual anomaly detection with discrete and continuous protocols plus a training-free DINOSaur method that outperforms prior CAD approaches with zero forgetting and sub-100ms edge inference.

iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.

DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.

Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.

Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

LMMs perceive videos but underexploit visual content for causal reasoning due to textual shortcuts; ProCauEval diagnoses this and ADPO training reduces reliance on priors.

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Derf: Decomposed radiance fields

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer