hub Mixed citations

End- to-end object detection with transformers

End-to-End Object Detection with Transformers , author= · 2005 · arXiv 2005.12872

Mixed citation behavior. Most common role is background (67%).

26 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 26 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 1

citation-polarity summary

background 4 unclear 1 use method 1

representative citing papers

Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature

cs.CV · 2026-06-29 · accept · novelty 8.0

MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.

GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

GLACIER is a single-stage transformer model treating MS/MS fragmentation as subgraph detection on molecular graphs, reporting 70.0% Top-1 accuracy on MassSpecGym and 8x speedup over prior two-stage methods.

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.

Transformers Provably Learn Sparse XOR with Polylogarithmic Parameters

cs.LG · 2025-02-11 · unverdicted · novelty 7.0

Single-layer two-head Transformers learn sparse XOR with O(polylog(d)) parameters in one gradient step, breaking the Omega(d) parameter bottleneck of FFNNs.

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

cs.RO · 2023-04-23 · conditional · novelty 7.0

Low-cost imprecise robots achieve 80-90% success on six fine bimanual manipulation tasks using imitation learning with a new Action Chunking with Transformers algorithm trained on only 10 minutes of demonstrations.

Flow Matching in Feature Space for Stochastic World Modeling

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.

From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection

cs.CV · 2026-06-22 · unverdicted · novelty 6.0

Proposes DERNet with Decompose-Enhance-Reconstruct operator and three plug-and-play modules to shift small object detection from spatial to spectral feature processing, claiming better performance than YOLOv11 with 1/6 the parameters.

Better Queries, Cheaper Attention: Adapting Transformers for Efficient Sparse Reconstruction

hep-ex · 2026-06-16 · unverdicted · novelty 6.0

A geometry-aware dynamic-query transformer decoder with Local Strided Cross-Attention raises track reconstruction efficiency from 94.1% to 98.1%, halves latency, and cuts memory use by over 10x versus fixed-query baselines in a simplified HL-LHC simulation.

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

eess.SP · 2026-05-20 · unverdicted · novelty 6.0

AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.

DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 6.0 · 2 refs

DeCo-DETR builds hierarchical semantic prototypes offline and uses decoupled training streams to deliver competitive zero-shot open-vocabulary detection with improved inference speed.

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.

Learning to Detect and Segment for Open Vocabulary Object Detection

cs.CV · 2022-12-23 · unverdicted · novelty 6.0

CondHead conditionally parameterizes detection heads on semantic embeddings via aggregated expert and dynamically generated streams to improve generalization for novel categories.

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

cs.CV · 2026-06-17 · unverdicted · novelty 5.0

MMPM uses PIM for gaze/head/hand interactions and MTP (CVAE with query decoder) to model separate crossing/non-crossing trajectory distributions, outperforming baselines on PIE and JAAD with a new validation protocol.

ReforMe: Re-Shaping Documents with Contextual Prompting and Layout-Aware Propagation

cs.HC · 2026-06-02 · unverdicted · novelty 5.0

ReforMe is an interactive document digitization system using layout-aware propagation to generalize user corrections from natural language or direct edits, shown to improve efficiency in a 12-user study on real documents.

Phast: Simultaneous reconstruction of photoelectron count and time profiles from PMT waveforms via machine learning

hep-ex · 2026-05-28 · unverdicted · novelty 5.0

Phast applies a transformer encoder plus count-conditioned query decoder to reconstruct photoelectron count and time profiles from simulated PMT waveforms on toy Monte Carlo datasets.

Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model

astro-ph.SR · 2026-04-23 · unverdicted · novelty 5.0

A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.

RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.

Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention

eess.IV · 2026-04-15 · unverdicted · novelty 5.0

Dynamic Focal Attention learns class-specific difficulty via per-class biases in attention logits, improving Dice and IoU on imbalanced histopathology segmentation benchmarks.

MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.

SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.

A Machine Learning Framework for Real-Time Personalized Ergonomic Pose Analysis

cs.CV · 2026-06-11 · unverdicted · novelty 3.0

A framework for real-time ergonomic pose prediction from 3D volumetric video that trains personalized classifiers on user-labeled poses captured by RGB-D cameras.

citing papers explorer

Showing 23 of 23 citing papers after filters.

Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature cs.CV · 2026-06-29 · accept · none · ref 35
MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.
GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem cs.LG · 2026-06-28 · unverdicted · none · ref 4
GLACIER is a single-stage transformer model treating MS/MS fragmentation as subgraph detection on molecular graphs, reporting 70.0% Top-1 accuracy on MassSpecGym and 8x speedup over prior two-stage methods.
SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection cs.CV · 2026-04-05 · unverdicted · none · ref 9
SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.
Flow Matching in Feature Space for Stochastic World Modeling cs.CV · 2026-06-27 · unverdicted · none · ref 39
FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.
From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection cs.CV · 2026-06-22 · unverdicted · none · ref 48
Proposes DERNet with Decompose-Enhance-Reconstruct operator and three plug-and-play modules to shift small object detection from spatial to spectral feature processing, claiming better performance than YOLOv11 with 1/6 the parameters.
Better Queries, Cheaper Attention: Adapting Transformers for Efficient Sparse Reconstruction hep-ex · 2026-06-16 · unverdicted · none · ref 2
A geometry-aware dynamic-query transformer decoder with Local Strided Cross-Attention raises track reconstruction efficiency from 94.1% to 98.1%, halves latency, and cuts memory use by over 10x versus fixed-query baselines in a simplified HL-LHC simulation.
AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI eess.SP · 2026-05-20 · unverdicted · none · ref 24
AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 17
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization cs.CV · 2026-04-13 · unverdicted · none · ref 13
VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection cs.CV · 2026-04-05 · unverdicted · none · ref 59
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection cs.CV · 2026-04-03 · unverdicted · none · ref 3 · 2 links
DeCo-DETR builds hierarchical semantic prototypes offline and uses decoupled training streams to deliver competitive zero-shot open-vocabulary detection with improved inference speed.
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control cs.RO · 2026-02-13 · unverdicted · none · ref 51
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos cs.CV · 2026-06-17 · unverdicted · none · ref 36
MMPM uses PIM for gaze/head/hand interactions and MTP (CVAE with query decoder) to model separate crossing/non-crossing trajectory distributions, outperforming baselines on PIE and JAAD with a new validation protocol.
ReforMe: Re-Shaping Documents with Contextual Prompting and Layout-Aware Propagation cs.HC · 2026-06-02 · unverdicted · none · ref 3
ReforMe is an interactive document digitization system using layout-aware propagation to generalize user corrections from natural language or direct edits, shown to improve efficiency in a 12-user study on real documents.
Phast: Simultaneous reconstruction of photoelectron count and time profiles from PMT waveforms via machine learning hep-ex · 2026-05-28 · unverdicted · none · ref 7
Phast applies a transformer encoder plus count-conditioned query decoder to reconstruct photoelectron count and time profiles from simulated PMT waveforms on toy Monte Carlo datasets.
Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model astro-ph.SR · 2026-04-23 · unverdicted · none · ref 23
A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.
RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery cs.CV · 2026-04-21 · unverdicted · none · ref 7
RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.
Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention eess.IV · 2026-04-15 · unverdicted · none · ref 4
Dynamic Focal Attention learns class-specific difficulty via per-class biases in attention logits, improving Dice and IoU on imbalanced histopathology segmentation benchmarks.
MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling cs.CV · 2026-04-13 · unverdicted · none · ref 2
MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.
SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection cs.CV · 2026-04-29 · unverdicted · none · ref 8
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
A Machine Learning Framework for Real-Time Personalized Ergonomic Pose Analysis cs.CV · 2026-06-11 · unverdicted · none · ref 25
A framework for real-time ergonomic pose prediction from 3D volumetric video that trains personalized classifiers on user-labeled poses captured by RGB-D cameras.
A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery cs.CV · 2026-04-11 · unverdicted · none · ref 26
YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.
Efficiently Linking Real Scenes with Synthetic Data Generation for AI-based Cognitive Robotics and Computer Vision Applications cs.RO · 2026-06-18 · unverdicted · none · ref 4
The paper reviews limits in AI vision for robotics and describes work-in-progress on bridging sim-to-real domain gaps by linking real and synthetic training data.

End- to-end object detection with transformers

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer