Title resolution pending

URLhttps://arxiv · 2021 · arXiv 2103.17239

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.

BEiT: BERT Pre-Training of Image Transformers

cs.CV · 2021-06-15 · conditional · novelty 7.0

BEiT pre-trains vision transformers via masked image modeling on visual tokens and reaches 83.2% ImageNet top-1 accuracy for the base model and 86.3% for the large model using only ImageNet-1K data.

Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers

cs.LG · 2026-02-11 · unverdicted · novelty 6.0

TaperNorm gradually removes internal normalization in pre-norm transformers via learned gates that reach zero, revealing final norm as a scale anchor and enabling up to 1.18x faster KV-cached decoding with small loss increases.

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

cs.CV · 2021-10-05 · unverdicted · novelty 6.0

MobileViT is a lightweight vision transformer that reports 78.4% top-1 accuracy on ImageNet-1k with ~6M parameters, outperforming MobileNetv3 by 3.2% and DeIT by 6.2% at similar size, plus gains on MS-COCO detection.

Attention Residuals

cs.CL · 2026-03-16 · unverdicted · novelty 5.0

Attention Residuals replaces fixed residual summation with input-dependent softmax attention over preceding layers, and a blocked variant is shown to improve uniformity and downstream performance in a 48B-parameter model pre-trained on 1.4T tokens.

citing papers explorer

Showing 1 of 1 citing paper after filters.

BEiT: BERT Pre-Training of Image Transformers cs.CV · 2021-06-15 · conditional · none · ref 17
BEiT pre-trains vision transformers via masked image modeling on visual tokens and reaches 83.2% ImageNet top-1 accuracy for the base model and 86.3% for the large model using only ImageNet-1K data.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer