One of the Q, K or V weights in transformer self-attention is redundant and replaceable by the identity matrix under mild assumptions, reducing parameters by 25 percent with no loss in small-model performance.
Muse: Parallel multi-scale attention for sequence to sequence learning.arXiv preprint arXiv:1911.09483
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SGAP-PPIS generates residue-wise adaptive propagation coefficients from equivariant GNN geometric states to improve protein-protein interaction site prediction, reporting competitive results on Test_60.
citing papers explorer
-
Structure-Guided Adaptive Propagation for Protein-Protein Interaction Site Prediction
SGAP-PPIS generates residue-wise adaptive propagation coefficients from equivariant GNN geometric states to improve protein-protein interaction site prediction, reporting competitive results on Test_60.