super hub Canonical reference

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Abdal, Peter, Rameen and Qin, year=, Yipeng and Wonka · 2020 · arXiv 2600.2020

Canonical reference. 71% of citing Pith papers cite this work as background.

148 Pith papers citing it

Background 71% of classified citations

read on arXiv browse 148 citing papers more from Abdal

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 31 dataset 9 method 5 baseline 3

citation-polarity summary

background 34 use dataset 6 use method 5 baseline 3

authors

Abdal Peter Rameen and Qin year= Yipeng and Wonka

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

Mind2Web: Towards a Generalist Agent for the Web

cs.CL · 2023-06-09 · accept · novelty 8.0

Mind2Web is the first large-scale dataset of real-world web tasks for developing generalist language-guided agents that complete complex actions on diverse websites.

ScaLe-INR: Scale and Learn Implicit Neural Representations

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.

Semantic Browsing: Controllable Diversity for Image Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

A technique for controllable diversity in text-to-image generation by inducing structured semantic variations at the prompt level via VLM and agentic workflow.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

Deep Unrolled Networks in Representation Space Applied to MRI Reconstruction

eess.IV · 2026-06-19 · unverdicted · novelty 7.0

DUNE enables exact data-consistency gradients via VJP when deep unrolled networks operate in representation space, yielding better MRI reconstructions than prior heuristic-DC variants.

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

REST-TS resolves text collapse in multimodal time series forecasting by exclusively supervising the text branch on numerical residuals to compel genuine content extraction from text descriptions.

Human Universal Grasping

cs.RO · 2026-06-15 · unverdicted · novelty 7.0

HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.

iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

iSAGE achieves near-dense mIoU performance in remote sensing semantic segmentation using iterative expert clicks on confident model errors with an error-weighted loss, using only 0.011-0.04% of pixels.

Targeting World Models to Compromise Robot Learning Pipelines

cs.RO · 2026-06-08 · unverdicted · novelty 7.0

World models introduce a stealthy poisoning vector into robot learning pipelines where malicious prompts or dynamics in teleoperated data activate only during synthetic trajectory generation, enabling backdoors in downstream policies.

Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design

cs.CE · 2026-06-04 · unverdicted · novelty 7.0

Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

LL-Bench supplies a human-annotated dataset exposing generative model weaknesses in low-level restoration and introduces LL-Score as an MLLM evaluator that outperforms existing quality metrics and can serve as a training reward.

DPA4: Pushing the Accuracy-Cost Frontier of Interatomic Potentials with EMFA SO(2) Convolution

physics.chem-ph · 2026-06-01 · unverdicted · novelty 7.0

DPA4 is a new SE(3)-equivariant interatomic potential with EMFA SO(2) convolution that sets new accuracy-cost records on Matbench Discovery and SPICE benchmarks using fewer parameters than prior models.

DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework

astro-ph.EP · 2026-05-28 · conditional · novelty 7.0 · 2 refs

DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.

ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

cs.GR · 2026-05-27 · unverdicted · novelty 7.0

ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.

RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations

cs.CV · 2026-05-22 · unverdicted · novelty 7.0 · 4 refs

RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.

3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes

cs.CV · 2026-05-21 · conditional · novelty 7.0

Introduces NMCA-aligned L1/L2 LULC schemes and the Loosdorf-MSL benchmark dataset, with Point Transformer V3 reaching 79.4% mIoU on 8 classes and 58.9% on 20 classes, plus gains from multispectral inputs.

CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts

cs.GR · 2026-05-18 · unverdicted · novelty 7.0

CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0 · 3 refs

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

MuteBench evaluates multimodal fusion robustness to modality missing and within-modality missing on 125000 samples from 9 clinical datasets, finding architecture family predicts tolerance better than parameter count.

Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation

cs.GR · 2026-05-11 · unverdicted · novelty 7.0

GRCA uses emitter-centric geometric culling of rays per triangle to accelerate LiDAR simulation in arbitrarily dynamic scenes, reporting up to 14.55x speedup over Embree and 7.97x over OptiX.

AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation

cs.CV · 2026-05-11 · conditional · novelty 7.0

AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

Hyperbolic Concept Bottleneck Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.

citing papers explorer

Showing 50 of 148 citing papers.

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife cs.CV · 2026-06-19 · unverdicted · none · ref 10
WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.
Mind2Web: Towards a Generalist Agent for the Web cs.CL · 2023-06-09 · accept · none · ref 33
Mind2Web is the first large-scale dataset of real-world web tasks for developing generalist language-guided agents that complete complex actions on diverse websites.
ScaLe-INR: Scale and Learn Implicit Neural Representations cs.CV · 2026-06-26 · unverdicted · none · ref 26
ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.
Semantic Browsing: Controllable Diversity for Image Generation cs.CV · 2026-06-22 · unverdicted · none · ref 69
A technique for controllable diversity in text-to-image generation by inducing structured semantic variations at the prompt level via VLM and agentic workflow.
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking cs.CV · 2026-06-21 · conditional · none · ref 4 · 2 links
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
Deep Unrolled Networks in Representation Space Applied to MRI Reconstruction eess.IV · 2026-06-19 · unverdicted · none · ref 22
DUNE enables exact data-consistency gradients via VJP when deep unrolled networks operate in representation space, yielding better MRI reconstructions than prior heuristic-DC variants.
Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting cs.LG · 2026-06-17 · unverdicted · none · ref 21
REST-TS resolves text collapse in multimodal time series forecasting by exclusively supervising the text branch on numerical residuals to compel genuine content extraction from text descriptions.
Human Universal Grasping cs.RO · 2026-06-15 · unverdicted · none · ref 22
HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision cs.CV · 2026-06-08 · unverdicted · none · ref 59
iSAGE achieves near-dense mIoU performance in remote sensing semantic segmentation using iterative expert clicks on confident model errors with an error-weighted loss, using only 0.011-0.04% of pixels.
Targeting World Models to Compromise Robot Learning Pipelines cs.RO · 2026-06-08 · unverdicted · none · ref 52
World models introduce a stealthy poisoning vector into robot learning pipelines where malicious prompts or dynamics in teleoperated data activate only during synthetic trajectory generation, enabling backdoors in downstream policies.
Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design cs.CE · 2026-06-04 · unverdicted · none · ref 57
Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.
LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models cs.CV · 2026-06-01 · unverdicted · none · ref 58
LL-Bench supplies a human-annotated dataset exposing generative model weaknesses in low-level restoration and introduces LL-Score as an MLLM evaluator that outperforms existing quality metrics and can serve as a training reward.
DPA4: Pushing the Accuracy-Cost Frontier of Interatomic Potentials with EMFA SO(2) Convolution physics.chem-ph · 2026-06-01 · unverdicted · none · ref 146
DPA4 is a new SE(3)-equivariant interatomic potential with EMFA SO(2) convolution that sets new accuracy-cost records on Matbench Discovery and SPICE benchmarks using fewer parameters than prior models.
DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework astro-ph.EP · 2026-05-28 · conditional · none · ref 32 · 2 links
DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.
ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation cs.GR · 2026-05-27 · unverdicted · none · ref 29
ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.
RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations cs.CV · 2026-05-22 · unverdicted · none · ref 8 · 4 links
RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.
3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes cs.CV · 2026-05-21 · conditional · none · ref 6
Introduces NMCA-aligned L1/L2 LULC schemes and the Loosdorf-MSL benchmark dataset, with Point Transformer V3 reaching 79.4% mIoU on 8 classes and 58.9% on 20 classes, plus gains from multispectral inputs.
CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts cs.GR · 2026-05-18 · unverdicted · none · ref 52
CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 83 · 3 links
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion cs.LG · 2026-05-13 · unverdicted · none · ref 46
MuteBench evaluates multimodal fusion robustness to modality missing and within-modality missing on 125000 samples from 9 clinical datasets, finding architecture family predicts tolerance better than parameter count.
Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation cs.GR · 2026-05-11 · unverdicted · none · ref 26
GRCA uses emitter-centric geometric culling of rays per triangle to accelerate LiDAR simulation in arbitrarily dynamic scenes, reporting up to 14.55x speedup over Embree and 7.97x over OptiX.
AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation cs.CV · 2026-05-11 · conditional · none · ref 42
AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification cs.AI · 2026-05-08 · conditional · none · ref 111
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
Hyperbolic Concept Bottleneck Models cs.LG · 2026-05-07 · unverdicted · none · ref 27
HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.
Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration cs.CV · 2026-05-07 · unverdicted · none · ref 6
CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks cs.LG · 2026-05-02 · unverdicted · none · ref 17 · 2 links
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
Congestion-Aware Dynamic Axonal Delay for Spiking Neural Networks cs.LG · 2026-05-02 · unverdicted · none · ref 6
CADAD adds activity-dependent dynamic delays to SNNs, improving accuracy on speech datasets while cutting parameter count by about 50% versus prior static delay approaches.
TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation cs.CV · 2026-04-29 · accept · none · ref 6
TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.
Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning cs.CV · 2026-04-23 · accept · none · ref 20
Trust-SSL introduces additive-residual trust weights in SSL to selectively handle corruptions in aerial imagery, yielding higher linear-probe accuracy and larger gains under severe degradations than SimCLR or VICReg.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 30 · 2 links
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 61
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
A global dataset of continuous urban dashcam driving cs.CV · 2026-04-01 · accept · none · ref 12
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction cs.CV · 2026-01-21 · unverdicted · none · ref 23
DuFal combines global and local high-frequency Fourier neural operators with cross-attention fusion to recover fine anatomical structures in extremely sparse-view CBCT, outperforming prior methods on LUNA16 and ToothFairy data.
GLUE: Coordinating Pre-Trained Generative Models for System-Level Design cs.CE · 2025-12-22 · conditional · none · ref 65
GLUE orchestrates frozen pre-trained generative models into a system-level design generator that enforces feasibility, performance, and diversity, with data-driven and data-free variants benchmarked on UAV design.
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations cs.CV · 2025-06-03 · unverdicted · none · ref 14 · 2 links
BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.
OOD-SEG: Exploiting out-of-distribution detection techniques for learning image segmentation from sparse multi-class positive-only annotations cs.CV · 2024-11-14 · unverdicted · none · ref 24
OOD-SEG reframes multi-class segmentation from sparse positive-only annotations as pixel-wise positive-unlabelled learning solved by integrating out-of-distribution detection techniques, with a proposed cross-validation evaluation on surgical imaging datasets.
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels cs.CV · 2023-12-28 · conditional · none · ref 245
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs cs.LG · 2026-06-26 · unverdicted · none · ref 177
Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.
A Comparison of Fusion Techniques for Multi-Modal Human Activity Recognition on the HARMES Dataset cs.LG · 2026-06-26 · conditional · none · ref 40
Gated Multi-modal Fusion reaches 0.82 macro F1 on HARMES, beating the concatenation baseline of 0.76 by 6 points under leave-one-participant-out evaluation.
Unmasking LAION-5B: Age, Gender, Race, and Emotion Biases in Large-Scale Image Datasets cs.CV · 2026-06-22 · unverdicted · none · ref 95
Empirical audit of LAION-2B-en and LAION-2B-multi finds overrepresentation of young adults, White people, and males plus stereotypical emotion associations across two attribute classifiers.
Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition cs.CV · 2026-06-21 · unverdicted · none · ref 28
Uncertainty decomposition via deep ensembles separates annotator disagreement from distribution shift in FER, enabling a routing mechanism that retains 1.8x more ambiguous faces at matched OOD rejection compared to single-uncertainty baselines.
Venice-H1: Failure-Aware Query Re-Ranking with Multi-Scale Grid Signatures for Referring Image Segmentation cs.CV · 2026-06-21 · unverdicted · none · ref 16
Venice-H1 improves failure-case mIoU by 0.89-1.40 points in referring image segmentation via multi-scale grid signatures and a failure-aware re-ranker, with positive CIs on all tested pairs and low harmful-switch rates.
Cross-View Yaw Estimation in Location Uncertainty with Line-Aligning Yaw Scoring cs.CV · 2026-06-20 · unverdicted · none · ref 27
Introduces LAYS, a radially invariant line-consensus voting method achieving sub-degree yaw precision in cross-view localization independent of location accuracy.
TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living cs.CV · 2026-06-18 · unverdicted · none · ref 40 · 2 links
TimeProVe proposes a propose-then-verify framework using lightweight action-based candidate evidence generation followed by targeted VLM verification for efficient long video temporal reasoning, achieving 7.3% improvement on OTB with 75% fewer VLM calls.
Temporal Preference Optimization for Unsupervised Retrieval cs.IR · 2026-06-16 · unverdicted · none · ref 5
TPOUR uses a novel TRPO method to improve unsupervised retrievers for temporal relevance, outperforming baselines including a much larger model on nDCG@5 for explicit and implicit time queries.
Reinforcing Dual-Path Reasoning in Spatial Vision Language Models cs.CV · 2026-06-16 · unverdicted · none · ref 122
SR-REAL equips spatial VLMs with dual LOR and DTR reasoning paths trained via RL, achieving better benchmark performance through mutual reinforcement and generalization without per-task tuning.
SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining cs.CV · 2026-06-09 · unverdicted · none · ref 4
SceneMiner shows that identity-preserving multi-task fine-tuning removes cross-task interference by zero-initializing new heads and freezing shared-stream parameters, enabling unified BEV scene mining with preserved original heads.
SOMA: From Surface Observations to Muscle Anatomy cs.CV · 2026-06-08 · unverdicted · none · ref 59
SOMA recovers spatio-temporal muscle behavior from multi-view RGB surface data and introduces the SKIM soft-tissue deformation dataset as the first such method from RGB observations.
VeriDrive: Verifiable Counterfactual Supervision for Cost-Efficient Vision-Language Planning cs.CV · 2026-06-05 · unverdicted · none · ref 26
VeriDrive introduces a verifiable counterfactual supervision framework using a Perception-Evaluation-Revision chain and validator-guided correction to generate cost-efficient structured data for vision-language driving models, showing metric gains on nuScenes.
DREAM: Dynamic Refinement of Early Assignment Mappings cs.IR · 2026-06-05 · unverdicted · none · ref 14
DREAM proposes intent-aware tokenization, frozen-model evaluation, and dynamic beams to refine early SID assignments and improve cold-start performance in generative recommenders on Amazon benchmarks.

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer