arXiv preprint arXiv:2102.10882 (2021)

Xiangxiang Chu, Zhi Tian, Bo Zhang, Xinlong Wang, Chunhua Shen · 2021 · arXiv 2102.10882

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

cs.CV · 2021-03-25 · accept · novelty 8.0

Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.

Masked-Token Prediction for Anomaly Detection at the Large Hadron Collider

hep-ph · 2026-04-22 · unverdicted · novelty 7.0

The work demonstrates masked-token prediction with transformers for model-independent anomaly detection in LHC data, achieving strong results on top-rich BSM signatures like four-top production using VQ-VAE tokenization.

Hierarchical Mesh Transformers with Topology-Guided Pretraining for Morphometric Analysis of Brain Structures

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A hierarchical mesh transformer using topology-guided pretraining on simplicial complexes achieves state-of-the-art results on Alzheimer's classification, amyloid prediction, and focal cortical dysplasia detection from brain meshes.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

cs.CV · 2023-03-28 · conditional · novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

Device Passport: Enabling Spatio-Temporal Pretrained Models to Generalize Across Input Layouts

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

Device Passport improves cross-layout transfer for biosignal models by learning expert mixture models from each channel's functional activity and metadata, outperforming baselines in transfer regimes.

DPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

DPPE decouples rotation and translation in camera positional encodings for multi-view transformers to resolve late-stage training stagnation and improve generalization in novel view synthesis.

End-to-End Context Compression at Scale

cs.CL · 2026-06-08 · unverdicted · novelty 6.0

LCLMs are scaled 0.6B-encoder 4B-decoder compressors pre-trained on over 350B tokens that improve the Pareto frontier for general-task performance, compression speed, and peak memory in long-context language model inference.

Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

WaveLiT combines wavelet tokenization, linear attention, and multiscale pyramids to produce parameter-efficient neural PDE solvers that match much larger models on TheWell benchmarks.

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

cs.CV · 2025-04-19 · unverdicted · novelty 6.0

LOOPE learns a patch ordering for positional embeddings in ViTs and introduces the Three Cell Experiment benchmark that shows 30-35% gaps in positional retention versus the usual 4-6%.

USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medical imaging modalities.

Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes

cs.CV · 2024-08-22 · unverdicted · novelty 4.0

GSAM applies random cropping to enable variable input sizes for efficient SAM fine-tuning, claiming lower compute with comparable or higher accuracy on varied datasets.

citing papers explorer

Showing 12 of 12 citing papers.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows cs.CV · 2021-03-25 · accept · none · ref 15
Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
Masked-Token Prediction for Anomaly Detection at the Large Hadron Collider hep-ph · 2026-04-22 · unverdicted · none · ref 14
The work demonstrates masked-token prediction with transformers for model-independent anomaly detection in LHC data, achieving strong results on top-rich BSM signatures like four-top production using VQ-VAE tokenization.
Hierarchical Mesh Transformers with Topology-Guided Pretraining for Morphometric Analysis of Brain Structures cs.CV · 2026-04-06 · unverdicted · none · ref 5
A hierarchical mesh transformer using topology-guided pretraining on simplicial complexes achieves state-of-the-art results on Alzheimer's classification, amyloid prediction, and focal cortical dysplasia detection from brain meshes.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 14
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 211
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Device Passport: Enabling Spatio-Temporal Pretrained Models to Generalize Across Input Layouts cs.LG · 2026-06-30 · unverdicted · none · ref 4
Device Passport improves cross-layout transfer for biosignal models by learning expert mixture models from each channel's functional activity and metadata, outperforming baselines in transfer regimes.
DPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers cs.CV · 2026-06-30 · unverdicted · none · ref 8
DPPE decouples rotation and translation in camera positional encodings for multi-view transformers to resolve late-stage training stagnation and improve generalization in novel view synthesis.
End-to-End Context Compression at Scale cs.CL · 2026-06-08 · unverdicted · none · ref 12
LCLMs are scaled 0.6B-encoder 4B-decoder compressors pre-trained on over 350B tokens that improve the Pareto frontier for general-task performance, compression speed, and peak memory in long-context language model inference.
Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers cs.LG · 2026-05-25 · unverdicted · none · ref 7
WaveLiT combines wavelet tokenization, linear attention, and multiscale pyramids to produce parameter-efficient neural PDE solvers that match much larger models on TheWell benchmarks.
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers cs.CV · 2025-04-19 · unverdicted · none · ref 6
LOOPE learns a patch ordering for positional embeddings in ViTs and introduces the Three Cell Experiment benchmark that shows 30-35% gaps in positional retention versus the usual 4-6%.
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation cs.CV · 2026-05-11 · unverdicted · none · ref 3
USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medical imaging modalities.
Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes cs.CV · 2024-08-22 · unverdicted · none · ref 10
GSAM applies random cropping to enable variable input sizes for efficient SAM fine-tuning, claiming lower compute with comparable or higher accuracy on varied datasets.

arXiv preprint arXiv:2102.10882 (2021)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer