super hub Mixed citations

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy · 2020 · cs.CV · arXiv 2010.11929

Mixed citation behavior. Most common role is background (57%).

793 Pith papers citing it

Background 57% of classified citations

open full Pith review browse 793 citing papers more from Dosovitskiy arXiv PDF

abstract

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 91 method 53 baseline 7 other 3 dataset 2

citation-polarity summary

background 89 use method 51 baseline 7 unclear 7 use dataset 2

claims ledger

abstract While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple m

authors

Dosovitskiy

co-cited works

representative citing papers

DyABD: The Abdominal Muscle Segmentation in Dynamic MRI Benchmark

cs.CV · 2026-04-25 · conditional · novelty 9.0

DyABD is the first benchmark dataset for abdominal muscle segmentation in dynamic MRIs featuring exercise-induced anatomical changes and pre/post-surgery scans, where existing models achieve an average Dice score of 0.82.

Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

cs.CR · 2026-05-19 · unverdicted · novelty 8.0

VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.

iMiGUE-3K: A Large-Scale Benchmark for Micro-Gesture Analysis with Self-Supervised Learning

cs.CV · 2026-05-16 · unverdicted · novelty 8.0

iMiGUE-3K is the largest in-the-wild micro-gesture video dataset with 3.4K clips and 37M frames from real interviews, supporting self-supervised foundation models and benchmarks that show micro-gestures improve emotion understanding.

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

CheXTemporal: A Dataset for Temporally-Grounded Reasoning in Chest Radiography

cs.CV · 2026-05-11 · accept · novelty 8.0

CheXTemporal supplies paired chest X-rays with explicit temporal progression taxonomy and spatial grounding to benchmark and improve models on longitudinal reasoning tasks.

Dissecting Jet-Tagger Through Mechanistic Interpretability

hep-ph · 2026-05-11 · accept · novelty 8.0

A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.

Gradient-Based Program Synthesis with Neurally Interpreted Languages

cs.LG · 2026-04-20 · unverdicted · novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.

S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding

cs.CV · 2026-01-01 · unverdicted · novelty 8.0

S1-MMAlign is a new large-scale dataset of 15.5 million semantically enhanced scientific image-text pairs created via an AI recaptioning pipeline to improve multimodal understanding.

A document is worth a structured record: Principled inductive bias design for document recognition

cs.CV · 2025-07-11 · unverdicted · novelty 8.0

Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

cs.RO · 2023-03-07 · accept · novelty 8.0

Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.

Efficiently Modeling Long Sequences with Structured State Spaces

cs.LG · 2021-10-31 · unverdicted · novelty 8.0

S4 is an efficient state space sequence model that captures long-range dependencies via structured parameterization of the SSM, achieving state-of-the-art results on the Long Range Arena and other benchmarks while being faster than Transformers for generation.

Decision Transformer: Reinforcement Learning via Sequence Modeling

cs.LG · 2021-06-02 · accept · novelty 8.0

Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.

Emerging Properties in Self-Supervised Vision Transformers

cs.CV · 2021-04-29 · conditional · novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

MotifGen: Spatiotemporal interpolation of misaligned satellite images via multi-source generative modeling, in an application to tropical cyclones

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

MotifGen is the first multi-source generative model for spatiotemporal interpolation of misaligned microwave cyclone images from heterogeneous instruments at irregular intervals, achieving lower CRPS via self-supervised training and closer power spectra than deterministic baselines when combining in

eCNNTO: A Highly Generalizable ConvNet for Accelerating Topology Optimization

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

eCNNTO applies an element-wise CNN with residual connections and final-stage training data to accelerate density-based topology optimization while generalizing across boundary conditions, loads, geometries, and mesh sizes.

Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy

astro-ph.IM · 2026-06-16 · unverdicted · novelty 7.0

A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.

LLM Agents Can See Code Repositories

cs.SE · 2026-06-12 · unverdicted · novelty 7.0

Visual graphs of repository structure added to text inputs for multimodal LLM agents reduce token consumption by up to 26% while maintaining or improving issue-resolution accuracy.

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

Event-Illumination Collaborative Low-light Image Enhancement with a High-resolution Real-world Dataset

cs.CV · 2026-05-21 · conditional · novelty 7.0

EIC-LIE uses an event-illumination collaborative module and illumination-aware event filter plus a new real-world dataset to improve low-light image enhancement over prior methods.

FTerViT: Fully Ternary Vision Transformer

cs.CV · 2026-05-20 · conditional · novelty 7.0

FTerViT introduces fully ternary Vision Transformers with TernaryBitConv2d and TernaryLayerNorm operators, achieving 82.43% ImageNet top-1 at 6.09 MB with 15x compression.

VSCD: Video-based Scene Change Detection in Unaligned Scenes

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

VSCD presents a query-centric multi-reference model for pixel-wise change detection in unaligned, unsynchronized indoor videos, backed by a 1.1-million-frame benchmark and real-robot validation for surveillance and incremental learning.

MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.

Trust It or Not: Evidential Uncertainty for Feed-Forward 3D Reconstruction with Trust3R

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Trust3R introduces a gated residual refinement plus Normal-Inverse-Wishart evidential head that produces closed-form multivariate Student-t uncertainty for per-point geometry in feed-forward 3D reconstruction and improves uncertainty ranking metrics on indoor and outdoor benchmarks.

citing papers explorer

Showing 50 of 793 citing papers.

DyABD: The Abdominal Muscle Segmentation in Dynamic MRI Benchmark cs.CV · 2026-04-25 · conditional · none · ref 36 · internal anchor
DyABD is the first benchmark dataset for abdominal muscle segmentation in dynamic MRIs featuring exercise-induced anatomical changes and pre/post-surgery scans, where existing models achieve an average Dice score of 0.82.
Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures cs.CR · 2026-05-19 · unverdicted · none · ref 6 · internal anchor
VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.
iMiGUE-3K: A Large-Scale Benchmark for Micro-Gesture Analysis with Self-Supervised Learning cs.CV · 2026-05-16 · unverdicted · none · ref 53 · internal anchor
iMiGUE-3K is the largest in-the-wild micro-gesture video dataset with 3.4K clips and 37M frames from real interviews, supporting self-supervised foundation models and benchmarks that show micro-gestures improve emotion understanding.
Privacy Auditing with Zero (0) Training Run cs.CR · 2026-05-14 · unverdicted · none · ref 11 · internal anchor
Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
CheXTemporal: A Dataset for Temporally-Grounded Reasoning in Chest Radiography cs.CV · 2026-05-11 · accept · none · ref 14 · internal anchor
CheXTemporal supplies paired chest X-rays with explicit temporal progression taxonomy and spatial grounding to benchmark and improve models on longitudinal reasoning tasks.
Dissecting Jet-Tagger Through Mechanistic Interpretability hep-ph · 2026-05-11 · accept · none · ref 48 · internal anchor
A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.
Gradient-Based Program Synthesis with Neurally Interpreted Languages cs.LG · 2026-04-20 · unverdicted · none · ref 96 · internal anchor
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding cs.CV · 2026-01-01 · unverdicted · none · ref 12 · internal anchor
S1-MMAlign is a new large-scale dataset of 15.5 million semantically enhanced scientific image-text pairs created via an AI recaptioning pipeline to improve multimodal understanding.
A document is worth a structured record: Principled inductive bias design for document recognition cs.CV · 2025-07-11 · unverdicted · none · ref 50 · internal anchor
Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 93 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion cs.RO · 2023-03-07 · accept · none · ref 3 · internal anchor
Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.
Efficiently Modeling Long Sequences with Structured State Spaces cs.LG · 2021-10-31 · unverdicted · none · ref 12 · internal anchor
S4 is an efficient state space sequence model that captures long-range dependencies via structured parameterization of the SSM, achieving state-of-the-art results on the Long Range Arena and other benchmarks while being faster than Transformers for generation.
Decision Transformer: Reinforcement Learning via Sequence Modeling cs.LG · 2021-06-02 · accept · none · ref 68 · internal anchor
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
Emerging Properties in Self-Supervised Vision Transformers cs.CV · 2021-04-29 · conditional · none · ref 19 · internal anchor
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
MotifGen: Spatiotemporal interpolation of misaligned satellite images via multi-source generative modeling, in an application to tropical cyclones cs.CV · 2026-06-23 · unverdicted · none · ref 7 · internal anchor
MotifGen is the first multi-source generative model for spatiotemporal interpolation of misaligned microwave cyclone images from heterogeneous instruments at irregular intervals, achieving lower CRPS via self-supervised training and closer power spectra than deterministic baselines when combining in
eCNNTO: A Highly Generalizable ConvNet for Accelerating Topology Optimization cs.AI · 2026-06-18 · unverdicted · none · ref 9 · internal anchor
eCNNTO applies an element-wise CNN with residual connections and final-stage training data to accelerate density-based topology optimization while generalizing across boundary conditions, loads, geometries, and mesh sizes.
Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy astro-ph.IM · 2026-06-16 · unverdicted · none · ref 27 · internal anchor
A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.
LLM Agents Can See Code Repositories cs.SE · 2026-06-12 · unverdicted · none · ref 8 · internal anchor
Visual graphs of repository structure added to text inputs for multimodal LLM agents reduce token consumption by up to 26% while maintaining or improving issue-resolution accuracy.
Toward Calibrated, Fair, and accurate Deepfake Detection cs.LG · 2026-06-03 · unverdicted · none · ref 263 · internal anchor
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
Event-Illumination Collaborative Low-light Image Enhancement with a High-resolution Real-world Dataset cs.CV · 2026-05-21 · conditional · none · ref 8 · internal anchor
EIC-LIE uses an event-illumination collaborative module and illumination-aware event filter plus a new real-world dataset to improve low-light image enhancement over prior methods.
FTerViT: Fully Ternary Vision Transformer cs.CV · 2026-05-20 · conditional · none · ref 1 · internal anchor
FTerViT introduces fully ternary Vision Transformers with TernaryBitConv2d and TernaryLayerNorm operators, achieving 82.43% ImageNet top-1 at 6.09 MB with 15x compression.
VSCD: Video-based Scene Change Detection in Unaligned Scenes cs.CV · 2026-05-20 · unverdicted · none · ref 30 · internal anchor
VSCD presents a query-centric multi-reference model for pixel-wise change detection in unaligned, unsynchronized indoor videos, backed by a 1.1-million-frame benchmark and real-robot validation for surveillance and incremental learning.
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space cs.CV · 2026-05-19 · unverdicted · none · ref 18 · internal anchor
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
Trust It or Not: Evidential Uncertainty for Feed-Forward 3D Reconstruction with Trust3R cs.CV · 2026-05-19 · unverdicted · none · ref 2 · internal anchor
Trust3R introduces a gated residual refinement plus Normal-Inverse-Wishart evidential head that produces closed-form multivariate Student-t uncertainty for per-point geometry in feed-forward 3D reconstruction and improves uncertainty ranking metrics on indoor and outdoor benchmarks.
Targeted Downstream-Agnostic Attack cs.CV · 2026-05-19 · unverdicted · none · ref 36 · internal anchor
Introduces Targeted Downstream-Agnostic Attack (TDAA) that uses a threat image as feature anchor and example-specific perturbations to achieve targeted attacks on unknown downstream tasks from pre-trained encoders.
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 78 · internal anchor
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
CineMatte: Background Matting for Virtual Production and Beyond cs.CV · 2026-05-18 · unverdicted · none · ref 8 · internal anchor
CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
GraphMAR: Geometry-Aware Graph Learning Framework for Spatially Adaptive CT Metal Artifact Reduction cs.CV · 2026-05-17 · unverdicted · none · ref 36 · internal anchor
GraphMAR introduces graph-based geometric modeling and a GraphMoE module to explicitly localize and spatially adaptively reduce metal artifacts in CT images using only image-domain inputs.
HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation cs.CV · 2026-05-16 · unverdicted · none · ref 9 · internal anchor
HEED replaces uniform residual alignment with density-weighted alignment using patch self-dissimilarity to improve hybrid VLM distillation, gaining 8.7 points on OCRBench v2 and 5.13 on a 10-benchmark average.
SHED: Style-Homogenized Embedding Alignment for Domain Generalization cs.CV · 2026-05-16 · conditional · none · ref 47 · internal anchor
SHED improves domain generalization in CLIP by aligning style-homogenized embeddings instead of raw ones, achieving state-of-the-art results on five benchmarks including a 4% gain on DomainNet.
Observation-Aligned Mask Priors for Learning Physical Dynamics from Authentic Occlusions cs.CV · 2026-05-16 · unverdicted · none · ref 34 · internal anchor
A framework pretrained on authentic binary occlusion masks uses guided sampling and intersection-based partitioning to train diffusion models on incomplete physical observations without zero-query regions.
Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis cs.LG · 2026-05-15 · unverdicted · none · ref 62 · internal anchor
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
DIPA: Distilled Preconditioned Algorithms for Solving Imaging Inverse Problems eess.IV · 2026-05-14 · unverdicted · none · ref 23 · internal anchor
DIPA learns preconditioning operators via distillation from a teacher with a better sensing matrix to improve reconstruction quality for the student's physically constrained matrix in imaging inverse problems.
CoralLite: {\mu}CT Reconstruction of Coral Colonies from Individual Corallites cs.CV · 2026-05-14 · conditional · none · ref 8 · internal anchor
CoralLite dataset and V-Trans-UNet baseline enable segmentation of individual corallites from μCT scans of Porites coral colonies with reported Dice scores of 0.77 on same-colony slices and 0.63 on unrelated specimens.
DiffPhD: A Unified Differentiable Solver for Projective Heterogeneous Materials in Elastodynamics with Contact-Rich GPU-Acceleration cs.GR · 2026-05-14 · unverdicted · none · ref 7 · internal anchor
DiffPhD delivers a unified differentiable projective dynamics solver for heterogeneous hyperelastic elastodynamics with contact that achieves up to 10x speedup and stable convergence on 100x stiffness contrasts while preserving strict gradient accuracy.
Convergence of difference inclusions via a diameter criterion math.OC · 2026-05-14 · unverdicted · none · ref 130 · internal anchor
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 1 · internal anchor
Genetic programming evolves heterogeneous layer-specific scalar functions to approximate layer normalization in pre-trained ViTs, capturing 91.6% variance versus 70.2% for uniform baselines and recovering 84.25% ImageNet Top-1 accuracy after 20 epochs of adaptation.
Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning cs.CV · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 20 · internal anchor
QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.
MedCore: Boundary-Preserving Medical Core Pruning for MedSAM cs.CV · 2026-05-13 · unverdicted · none · ref 3 · internal anchor
MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.
Sensing-Assisted LoS/NLoS Identification in Dynamic UAV Positioning Systems eess.SP · 2026-05-13 · unverdicted · none · ref 23 · internal anchor
A new dual-input feature fusion network using RGB images and channel impulse responses identifies LoS/NLoS conditions for UAVs with up to 97.69% accuracy and reduces trilateration positioning error by about 70%.
RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 48 · internal anchor
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 4 · 2 links · internal anchor
KamonBench is a grammar-based dataset of 20,000 synthetic Japanese crests with multi-format annotations that enables direct evaluation of factor recovery beyond caption accuracy in vision-language models.
From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation cs.CR · 2026-05-13 · unverdicted · none · ref 42 · 2 links · internal anchor
SubPopMark embeds verifiable subpopulation biases into distilled datasets via CVM and USTM optimization stages, allowing provenance inference through comparison of model output signatures against a reference behavior bank.
Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic cs.LG · 2026-05-12 · unverdicted · none · ref 83 · 2 links · internal anchor
Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.
From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 15 · internal anchor
MoLA infers a mixture of latent actions from generated future videos via modality-aware inverse dynamics models to improve robot manipulation policies.
How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines cs.LG · 2026-05-12 · conditional · none · ref 1 · internal anchor
The paper decomposes errors in trajectory-based data attribution into config, algorithm, and system levels, proposes AdamW-influence to fix optimizer mismatch, derives an error proxy for Taylor approximation, and unifies data selection under a K-step look-ahead framework.
SoK: Unlearnability and Unlearning for Model Dememorization cs.LG · 2026-05-12 · conditional · none · ref 45 · internal anchor
The first integrated taxonomy, empirical study of interplay and shallow dememorization, plus a theoretical guarantee on dememorization depth for certified unlearning.
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles cs.CV · 2026-05-12 · unverdicted · none · ref 10 · internal anchor
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
Can Graphs Help Vision SSMs See Better? cs.CV · 2026-05-11 · unverdicted · none · ref 10 · internal anchor
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer