pith. sign in

Dat++: Spatially dynamic vision transformer with deformable attention.arXiv preprint arXiv:2309.01430

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CV 3

years

2026 2 2025 1

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

C-GSPN scales 2D spatial propagation to foundation vision encoders via a fast CUDA kernel, compressed blocks, and two-stage distillation, matching ViT performance with 15% fewer parameters and 4x block speedup at 2K resolution.

ViT$^3$: Unlocking Test-Time Training in Vision

cs.CV · 2025-12-01 · unverdicted · novelty 6.0

ViT³ is a Test-Time Training vision model that achieves linear complexity, matches or exceeds other linear models like Mamba on classification, generation, detection and segmentation, and narrows the gap to standard vision Transformers.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation cs.CV · 2026-04-20 · unverdicted · none · ref 217

    AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.

  • Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders cs.CV · 2026-05-30 · unverdicted · none · ref 6

    C-GSPN scales 2D spatial propagation to foundation vision encoders via a fast CUDA kernel, compressed blocks, and two-stage distillation, matching ViT performance with 15% fewer parameters and 4x block speedup at 2K resolution.

  • ViT$^3$: Unlocking Test-Time Training in Vision cs.CV · 2025-12-01 · unverdicted · none · ref 63

    ViT³ is a Test-Time Training vision model that achieves linear complexity, matches or exceeds other linear models like Mamba on classification, generation, detection and segmentation, and narrows the gap to standard vision Transformers.