End-to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko · 2020

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

SAM 3D Animal is the first promptable framework for multi-animal 3D reconstruction from single images, built on SMAL+ and trained on the new Herd3D dataset, achieving SOTA results on Animal3D, APTv2, and Animal Kingdom benchmarks.

Don't Guess, Just Ask: Resolving Ambiguity in Referring Segmentation via Multi-turn Clarification

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

IC-Seg is a multi-turn clarification framework with hierarchical GRPO optimization that resolves ambiguous queries in referring video object segmentation and introduces the Ambi-RVOS benchmark.

LACE: Latent Visual Representation for Cross-Embodiment Learning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.

Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Invaria trains point cloud encoders with next-resolution prediction to learn scale and density invariant features, yielding higher mIoU on ScanNet under lower resolution and scaled objects while using a smaller model.

XDecomposer: Learning Prior-Free Set Decomposition for Multiphase X-ray Diffraction

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

XDecomposer uses set prediction and phase-query decomposition to jointly identify phases and reconstruct multiphase PXRD patterns without priors.

ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

ViCrop-Det uses spatial attention entropy from the decoder to dynamically crop and refine small-object regions in transformer detectors during inference.

ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models

cs.CV · 2025-09-18 · unverdicted · novelty 6.0 · 2 refs

ORCA is an agentic reasoning framework that enhances factual accuracy and adversarial robustness of pretrained LVLMs via an Observe-Reason-Critique-Act loop with small vision models, reporting accuracy gains of up to 40% on hallucination benchmarks and 20% under adversarial perturbations.

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

cs.CV · 2026-04-02 · unverdicted · novelty 5.0

Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.

FingerEye: Learning Dexterous Manipulation with Continuous Vision-Tactile Sensing

cs.RO · 2026-04-22

citing papers explorer

Showing 6 of 6 citing papers after filters.

SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild cs.CV · 2026-05-08 · unverdicted · none · ref 10
SAM 3D Animal is the first promptable framework for multi-animal 3D reconstruction from single images, built on SMAL+ and trained on the new Herd3D dataset, achieving SOTA results on Animal3D, APTv2, and Animal Kingdom benchmarks.
Don't Guess, Just Ask: Resolving Ambiguity in Referring Segmentation via Multi-turn Clarification cs.CV · 2026-05-17 · unverdicted · none · ref 5
IC-Seg is a multi-turn clarification framework with hierarchical GRPO optimization that resolves ambiguous queries in referring video object segmentation and introduces the Ambi-RVOS benchmark.
Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction cs.CV · 2026-05-15 · unverdicted · none · ref 16
Invaria trains point cloud encoders with next-resolution prediction to learn scale and density invariant features, yielding higher mIoU on ScanNet under lower resolution and scaled objects while using a smaller model.
ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection cs.CV · 2026-04-29 · unverdicted · none · ref 2
ViCrop-Det uses spatial attention entropy from the decoder to dynamically crop and refine small-object regions in transformer detectors during inference.
ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models cs.CV · 2025-09-18 · unverdicted · none · ref 31 · 2 links
ORCA is an agentic reasoning framework that enhances factual accuracy and adversarial robustness of pretrained LVLMs via an Observe-Reason-Critique-Act loop with small vision models, reporting accuracy gains of up to 40% on hallucination benchmarks and 20% under adversarial perturbations.
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks cs.CV · 2026-04-02 · unverdicted · none · ref 8
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.

End-to-end object detection with transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer