super hub Mixed citations

Derf: Decomposed radiance fields

Guang Feng, Lihe Zhang, Zhiwei Hu · 2021 · arXiv 6437.2021

Mixed citation behavior. Most common role is background (67%).

132 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 132 citing papers more from Guang Feng

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 23 dataset 5 method 5 baseline 3

citation-polarity summary

background 24 use dataset 5 baseline 3 use method 3 unclear 1

authors

and Huchuan Lu Guang Feng Lihe Zhang Zhiwei Hu

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

cs.CL · 2024-09-04 · accept · novelty 8.0

MMMU-Pro is a stricter multimodal benchmark that removes text-only solvable questions, augments options, and requires reading text from images, yielding substantially lower model scores of 16.8-26.9%.

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

cs.LG · 2022-07-05 · conditional · novelty 8.0

TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

SpikeTAD proposes the first SNN-based end-to-end TAD model, reporting 67.2% mAP on THUMOS14 and 37.42% on ActivityNet-1.3 with extremely low power consumption.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

Attribution via Distributional Paths for Information Revelation

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Reveal-IG performs path attribution by integrating model output changes along trajectories in a space of probe distributions rather than input-space paths, retaining completeness and handling multiscale or uncertain features.

Quality-Guided Semi-Supervised Learning for Medical Image Segmentation

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

A new quality-guided approach for semi-supervised medical image segmentation that trains a predictor on synthetic errors to enhance pseudolabel handling.

ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

cs.GR · 2026-05-27 · unverdicted · novelty 7.0

ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.

DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.

Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.

Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

Vector Scaffolding uses Interior Gradient Aggregation, Progressive Stratification, and Rapid Inflation Scheduling to achieve 2.5x faster optimization and up to 1.4 dB higher PSNR in differentiable vectorization.

Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

LMMs perceive videos but underexploit visual content for causal reasoning due to textual shortcuts; ProCauEval diagnoses this and ADPO training reduces reliance on priors.

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.

Too Sharp, Too Sure: When Calibration Follows Curvature

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Calibration error tracks curvature via shared margin-dependent exponential tails; a margin-aware objective improves out-of-sample calibration across optimizers.

Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.

Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes

cs.CV · 2026-04-20 · unverdicted · novelty 7.0 · 3 refs

SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.

Orthogonal Transformations for Efficient Data-Driven Reachability Analysis

eess.SY · 2026-04-15 · unverdicted · novelty 7.0

Orthogonal transformations before order reduction in matrix zonotopes produce order-of-magnitude smaller reachable set volumes while keeping generator counts comparable.

FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation

astro-ph.IM · 2026-04-14 · unverdicted · novelty 7.0

FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.

Learning to Build Shapes by Extrusion

cs.GR · 2026-01-30 · unverdicted · novelty 7.0

Text Encoded Extrusions (TEE) lets LLMs generate and edit manifold 3D meshes by learning sequences of face extrusions from decomposed quadrilateral meshes.

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

cs.CV · 2025-12-26 · conditional · novelty 7.0

BadVSFM is the first effective backdoor attack on prompt-driven video segmentation foundation models, using a two-stage encoder-decoder strategy to achieve high attack success rates with limited clean performance loss.

Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing

cs.CV · 2025-10-03 · unverdicted · novelty 7.0

A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.

citing papers explorer

Showing 50 of 132 citing papers.

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife cs.CV · 2026-06-19 · unverdicted · none · ref 3
WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark cs.CL · 2024-09-04 · accept · none · ref 66
MMMU-Pro is a stricter multimodal benchmark that removes text-only solvable questions, augments options, and requires reading text from images, yielding substantially lower model scores of 16.8-26.9%.
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second cs.LG · 2022-07-05 · conditional · none · ref 18
TabPFN is a Prior-Data Fitted Network that approximates Bayesian inference for small tabular classification by training a Transformer once on synthetic data drawn from a causal prior, then solves new tasks in a single forward pass without further updates.
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking cs.CV · 2026-06-21 · conditional · none · ref 8 · 2 links
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation cs.CV · 2026-06-10 · unverdicted · none · ref 54
A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.
SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection cs.CV · 2026-06-10 · unverdicted · none · ref 25
SpikeTAD proposes the first SNN-based end-to-end TAD model, reporting 67.2% mAP on THUMOS14 and 37.42% on ActivityNet-1.3 with extremely low power consumption.
Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation cs.CV · 2026-06-05 · unverdicted · none · ref 3
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
Attribution via Distributional Paths for Information Revelation cs.LG · 2026-06-02 · unverdicted · none · ref 7
Reveal-IG performs path attribution by integrating model output changes along trajectories in a space of probe distributions rather than input-space paths, retaining completeness and handling multiscale or uncertain features.
Quality-Guided Semi-Supervised Learning for Medical Image Segmentation cs.CV · 2026-06-01 · unverdicted · none · ref 9
A new quality-guided approach for semi-supervised medical image segmentation that trains a predictor on synthetic errors to enhance pseudolabel handling.
ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation cs.GR · 2026-05-27 · unverdicted · none · ref 35
ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.
DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG cs.AI · 2026-05-18 · unverdicted · none · ref 5
DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 60
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization cs.CV · 2026-05-12 · unverdicted · none · ref 20
Vector Scaffolding uses Interior Gradient Aggregation, Progressive Stratification, and Rapid Inflation Scheduling to achieve 2.5x faster optimization and up to 1.4 dB higher PSNR in differentiable vectorization.
Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs cs.CL · 2026-05-10 · unverdicted · none · ref 5
LMMs perceive videos but underexploit visual content for causal reasoning due to textual shortcuts; ProCauEval diagnoses this and ADPO training reduces reliance on priors.
TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On cs.CV · 2026-04-30 · unverdicted · none · ref 7
A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention cs.CV · 2026-04-30 · unverdicted · none · ref 39
Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.
Too Sharp, Too Sure: When Calibration Follows Curvature cs.LG · 2026-04-22 · unverdicted · none · ref 14
Calibration error tracks curvature via shared margin-dependent exponential tails; a margin-aware objective improves out-of-sample calibration across optimizers.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 12
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes cs.CV · 2026-04-20 · unverdicted · none · ref 10 · 3 links
SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.
Orthogonal Transformations for Efficient Data-Driven Reachability Analysis eess.SY · 2026-04-15 · unverdicted · none · ref 26
Orthogonal transformations before order reduction in matrix zonotopes produce order-of-magnitude smaller reachable set volumes while keeping generator counts comparable.
FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation astro-ph.IM · 2026-04-14 · unverdicted · none · ref 20
FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.
Learning to Build Shapes by Extrusion cs.GR · 2026-01-30 · unverdicted · none · ref 43
Text Encoded Extrusions (TEE) lets LLMs generate and edit manifold 3D meshes by learning sequences of face extrusions from decomposed quadrilateral meshes.
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models cs.CV · 2025-12-26 · conditional · none · ref 56
BadVSFM is the first effective backdoor attack on prompt-driven video segmentation foundation models, using a two-stage encoder-decoder strategy to achieve high attack success rates with limited clean performance loss.
Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing cs.CV · 2025-10-03 · unverdicted · none · ref 9
A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.
From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models cs.CV · 2025-08-03 · unverdicted · none · ref 51
IMAGEO-Bench evaluates 10 LLMs on image geolocalization across global street scenes, US POIs, and private images, revealing closed-source model advantages and biases favoring high-resource regions.
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET eess.IV · 2024-06-18 · unverdicted · none · ref 46
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
Evaluating Object Hallucination in Large Vision-Language Models cs.CV · 2023-05-17 · accept · none · ref 37
Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs cs.LG · 2026-06-26 · unverdicted · none · ref 176
Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.
HandMade: Spatial Prompting for Generative 3D Creation with Part-Labeled VR Sketches cs.HC · 2026-06-26 · unverdicted · none · ref 71
HandMade converts segmented VR strokes into multi-view part guidance and structured prompts so generative 3D models better preserve user-specified spatial scaffolds than text-only or sketch baselines.
Minkowski-Type Wasserstein Metrics and Barycenters for Location-Scale Mixtures with Application to Domain Adaptation math.OC · 2026-06-25 · unverdicted · none · ref 22
A Minkowski-type Wasserstein framework for location-scale mixtures reduces multimarginal OT to discrete component transport with linear complexity and shows competitive domain adaptation performance.
Venice-H1: Failure-Aware Query Re-Ranking with Multi-Scale Grid Signatures for Referring Image Segmentation cs.CV · 2026-06-21 · unverdicted · none · ref 9
Venice-H1 improves failure-case mIoU by 0.89-1.40 points in referring image segmentation via multi-scale grid signatures and a failure-aware re-ranker, with positive CIs on all tested pairs and low harmful-switch rates.
Lighting-Consistent Object Transfer Across Radiance Fields cs.GR · 2026-06-21 · unverdicted · none · ref 231
Diffusion-based per-view harmonization for lighting-consistent object transfer between 3DGS scenes, using heterogeneous training data and final 3D consolidation.
Radial Basis Function Networks as Projection Heads in Self-Supervised Learning cs.CV · 2026-06-19 · unverdicted · none · ref 13 · 2 links
RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.
QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging cs.CV · 2026-06-18 · unverdicted · none · ref 3
QG-MIL introduces four gated transformer components that yield +6.1 average macro F1 improvement over baselines on six whole-slide and cell-level medical imaging benchmarks while producing more uniform attention.
Cross-Modal Knowledge Distillation without Paired Data: Theoretical Foundation and Algorithm cs.AI · 2026-06-09 · unverdicted · none · ref 1
A distribution-alignment framework for unpaired cross-modal knowledge distillation with theoretical guarantees on feature and label alignment.
Next-Token Prediction Learns Generalisable Representations of Sleep Physiology cs.AI · 2026-06-08 · unverdicted · none · ref 12
Next-token prediction on multi-modal tokenized sleep signals yields embeddings that match supervised performance with far less labels and generalize to daytime heart data.
Multi-Task Crack Foundation Model for Engineering-Reliable Crack Representation and Topology Preservation in Civil Infrastructure cs.CV · 2026-06-04 · unverdicted · none · ref 54
CrackGeoFM is a multi-task framework that adapts a frozen visual foundation model with FCEM, CFAM, and SMTD modules for crack mask prediction, skeleton reconstruction, and uncertainty estimation, reporting SOTA results across 20 datasets including few-shot settings.
Do Transformers Need Three Projections? Systematic Study of QKV Variants cs.LG · 2026-06-01 · conditional · none · ref 9
Q-K=V projection sharing in transformers matches standard QKV performance with 50% KV cache reduction and combines with GQA/MQA for up to 96.9% reduction across vision and language tasks.
AdaCodec: A Predictive Visual Code for Video MLLMs cs.CV · 2026-06-01 · unverdicted · none · ref 58
AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.
HERO'S JOURNEY: Testing Complex Rule Induction with Text Games cs.CL · 2026-06-01 · unverdicted · none · ref 65
HERO'S JOURNEY benchmark evaluates LLMs on attribute and procedural rule induction across four structural forms, finding limited uneven performance with execution as the main bottleneck and steering helping only attribute tasks.
Infinite-Dimensional Spherical Kernel ridge Regression stat.ME · 2026-05-29 · unverdicted · none · ref 294
An intrinsic spherical kernel ridge regression framework is introduced for non-linear responses on spheres, reducing infinite-dimensional estimation to finite via the representer theorem with convergence rates shown.
Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation cs.CV · 2026-05-29 · unverdicted · none · ref 6
Proposes DSP, a disentangled semantics and primitives framework using Semantic Anchoring, Primitive Imbuing, and Conceptual Steering that improves 5-shot atypical L2I generation over prior methods.
TextTeacher: What Can Language Teach About Images? cs.CV · 2026-05-21 · unverdicted · none · ref 13
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls cs.LG · 2026-05-20 · conditional · none · ref 30
Deep UCSL uses a contrastive EM loss on patient-control labels to isolate disease-driven subgroups in medical imaging by suppressing shared healthy variability.
SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining cs.CV · 2026-05-20 · unverdicted · none · ref 13
SpectralEarth-FM is a multisensor hierarchical transformer pretrained on a 40TB co-located HSI-MSI-SAR dataset using a JEPA-style objective and reports state-of-the-art results on hyperspectral and standard EO benchmarks.
Text-Guided Visual Representation Learning for Robust Multimodal E-Commerce Recommendation cs.IR · 2026-05-17 · unverdicted · none · ref 10
TGQ-Former uses metadata-guided hybrid queries and dual-gated modulation to improve visual token selection in multimodal e-commerce retrieval, raising average Hit Rate@100 by 6.04% over baselines.
Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation cs.CV · 2026-05-15 · unverdicted · none · ref 21 · 2 links
Decomposed Vision-Language Alignment framework factorizes prompts into concept and attribute tokens with Feature-Gated Cross-Attention for better compositional generalization in fine-grained open-vocabulary segmentation.
R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning cs.LG · 2026-05-13 · unverdicted · none · ref 12
R2R2 introduces a non-centered regularization objective for SPL that addresses conflicts with spectral properties, leading to better performance on continuous control tasks at high UTD ratios.
PhysEditBench: A Protocol-Conditioned Benchmark for Dense Physical-Map Prediction with Image Editors cs.CV · 2026-05-13 · unverdicted · none · ref 13
PhysEditBench is a protocol-conditioned benchmark evaluating image editors on dense prediction of depth, normal, albedo, roughness, and metallic maps from RGB images using curated data and fixed scoring rules.
On What We Can Learn from Low-Resolution Data cs.LG · 2026-05-12 · unverdicted · none · ref 76 · 2 links
Low-resolution data improves high-resolution model performance when high-resolution samples are limited, via KL-divergence bounds and experiments on vision transformers and CNNs.

Derf: Decomposed radiance fields

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer