Attention-based architectures like Swin Transformer show greater robustness to FP4 QAT recipe choice than CNNs across model scales in anomaly segmentation, with architecture having the largest impact.
TetraJet: Mitigating weight oscillation for robust MXFP4 vision transformer training.arXiv preprint arXiv:2502.20853, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation
Attention-based architectures like Swin Transformer show greater robustness to FP4 QAT recipe choice than CNNs across model scales in anomaly segmentation, with architecture having the largest impact.