arxiv: 2604.03803 · v1 · submitted 2026-04-04 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

R\'enyi Attention Entropy for Patch Pruning

Hiroaki Aizawa , Yuki Igaue

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:15 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords patch pruningRényi entropyattention mechanismvision transformersfine-grained recognitioncomputational efficiency

0 comments

The pith

Rényi entropy of attention distributions identifies redundant patches for pruning in vision transformers while preserving accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a patch pruning method for transformers that uses the entropy of the attention distribution to decide which image patches to keep. Low entropy means focused attention on important patches, so they are retained; high entropy means spread out attention on redundant patches, which are pruned. This reduces the quadratic cost of self-attention. Extending from Shannon to Rényi entropy allows tuning the pruning to emphasize sharp peaks and adapt to different tasks and compute budgets. Experiments on fine-grained image recognition show reduced computation with maintained accuracy, and Rényi adjustments yield better trade-offs.

Core claim

The central claim is that the Rényi entropy of the per-patch attention distribution provides an effective, adjustable criterion for patch pruning in vision transformers, where low-entropy patches are important and high-entropy ones are redundant, leading to computation savings without accuracy loss on fine-grained tasks.

What carries the argument

The Rényi entropy applied to the attention distribution over patches, which emphasizes sharp attention peaks and supports adaptive pruning policies.

If this is right

Self-attention computation decreases as fewer patches are processed.
Accuracy is preserved in fine-grained image recognition tasks.
Rényi entropy parameter tuning improves the accuracy versus computation trade-off.
Patch selection becomes more flexible for different computational limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar entropy-based pruning could be tested on language transformers for token reduction.
Combining this with other importance metrics might further optimize transformer efficiency.
The method's reliance on attention maps suggests it works best in models where attention is already computed.
Extensions to video or 3D data could prune temporal or spatial patches analogously.

Load-bearing premise

That the entropy of the attention distribution over patches reliably signals which patches are important for the downstream task, without requiring extensive task-specific validation or additional learned components.

What would settle it

If removing high-entropy patches according to this criterion causes a significant drop in accuracy on a fine-grained recognition benchmark compared to random or other pruning methods, the claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.03803 by Hiroaki Aizawa, Yuki Igaue.

**Figure 2.** Figure 2: Overall pipeline of the Rényi attention entropy pruning. The pruning procedure is described in Section 3.3. studies merge them to preserve content while reducing the token count [14, 6, 13, 28]. Task-aware pruning tailored to specific applications such as segmentation has also been explored [23, 5]. Although these methods achieve strong computational efficiency while maintaining performance, they often ov… view at source ↗

**Figure 3.** Figure 3: Visualization of Rényi attention entropy (α = 2.0) for each Transformer block in DeiT-S. This visualization shows that attention entropy depends on Transformer layer depth, and lower entropy corresponds to foreground regions. Definition 1 (Patch attention distribution). Let Xpatches = [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualizations of patch pruning results for EViT and Rényi attention entropybased approach on ImageNet-100, FGVC Aircraft, and Oxford Flowers102. From left to right: input image, pruning results at Blocks 4, 7, and 10. The keep rate is r = 0.7, and for our method we show the results with the tuned α. method. Shannon performed best at mild pruning such as r = 0.9. Rényi with a larger order tended to be pre… view at source ↗

**Figure 5.** Figure 5: Visualization of Shannon and Rényi attention entropies. For each DeiT-S block, the figure shows histograms of Shannon attention entropy (α = 1.0) and Rényi attention entropy at different α orders. Blue indicates informative patches that are kept, and red indicates redundant patches that are pruned. The results show that the Rényi order controls peak emphasis and allows the characterization of the attentio… view at source ↗

**Figure 6.** Figure 6: Results for attention entropy (top) and attention distance [19] (bottom). All values represent averages over 500 samples. Summary. Across both general and fine-grained tasks, we found that the accuracy varied considerably with the choice of α even at the same keep rate r. This parameter α should be tuned for each keep rate, and the resulting gains justify the tuning overhead. 4.3 Ablation Study Scaling to… view at source ↗

read the original abstract

Transformers are strong baselines in both vision and language because self-attention captures long-range dependencies across tokens. However, the cost of self-attention grows quadratically with the number of tokens. Patch pruning mitigates this cost by estimating per-patch importance and removing redundant patches. To identify informative patches for pruning, we introduce a criterion based on the Shannon entropy of the attention distribution. Low-entropy patches, which receive selective and concentrated attention, are kept as important, while high-entropy patches with attention spread across many locations are treated as redundant. We also extend the criterion from Shannon to R\'enyi entropy, which emphasizes sharp attention peaks and supports pruning strategies that adapt to task needs and computational limits. In experiments on fine-grained image recognition, where patch selection is critical, our method reduced computation while preserving accuracy. Moreover, adjusting the pruning policy through the R\'enyi entropy measure yields further gains and improves the trade-off between accuracy and computation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rényi entropy on attention maps gives a tunable pruning rule for ViT patches, but the abstract leaves the actual gains and controls too vague to assess.

read the letter

The paper's main move is to replace Shannon entropy with Rényi entropy when scoring how concentrated the attention is over image patches, then drop the high-entropy ones. The Rényi order alpha acts as a dial that can sharpen or soften the emphasis on peak attention values, which they say lets the pruning policy adapt to different accuracy-compute needs on fine-grained recognition tasks. They report that the method cuts computation while holding accuracy and that tuning alpha improves the trade-off further. The approach stays simple: it works directly on the existing attention weights with no extra learned modules or retraining. That keeps it lightweight and easy to plug into standard ViT pipelines. The abstract also positions the work as a direct extension of prior entropy-based pruning rather than a wholesale reinvention. On the soft side, the experimental backing is thin. No baselines are named, no pruning ratios or model sizes are given, and there are no ablations that separate the entropy signal from simpler rules such as pruning by summed attention or random token drop. The central assumption—that low entropy reliably marks task-relevant patches—still needs checking; focused attention can land on background while diffuse attention can cover useful local detail in fine-grained classes. Without those controls, any reported savings could be explained by token reduction alone. The math itself is straightforward and the citation pattern looks standard for the efficiency literature. This is the sort of incremental efficiency note that might interest people already running ViT inference on edge hardware or trying to squeeze token counts. If the full paper supplies reproducible numbers, variance estimates, and head-to-head comparisons, it is worth sending out for review so referees can verify the claims. Otherwise it stays preliminary.

Referee Report

3 major / 2 minor

Summary. The paper proposes a patch-pruning criterion for Vision Transformers based on the Shannon entropy of per-patch attention distributions, with an extension to Rényi entropy of tunable order alpha. Low-entropy patches (concentrated attention) are retained as informative while high-entropy patches (diffuse attention) are pruned to reduce quadratic self-attention cost. Experiments on fine-grained image recognition are claimed to show that the method reduces computation while preserving accuracy, and that varying the Rényi order further improves the accuracy-compute trade-off.

Significance. If the empirical claims hold with proper controls, the method supplies a lightweight, training-free pruning heuristic that directly leverages existing attention weights without additional learned parameters beyond the single Rényi order. This could be useful for efficiency in vision transformers where patch selection matters. The Rényi generalization is presented as a tunable knob rather than a fitted model, which is a modest but positive design choice.

major comments (3)

[Abstract, §4] Abstract and §4 (experiments): the central claim that the method 'reduced computation while preserving accuracy' and that Rényi 'yields further gains' is unsupported by any reported baselines, pruning ratios, statistical tests, or ablation tables. Without these, it is impossible to determine whether gains exceed those from simply reducing token count or from standard attention-based pruning heuristics.
[§3] §3 (method): the assumption that low Shannon/Rényi entropy reliably identifies task-important patches is load-bearing but untested. No controls compare entropy pruning against random pruning, magnitude-based pruning, or the base ViT attention itself; a diffuse-attention patch could still carry a discriminative local feature in fine-grained recognition, undermining the mapping from entropy to importance.
[§3.2] §3.2 (Rényi extension): varying the order alpha is presented as adapting to task needs, yet no derivation or ablation shows that different alpha values systematically trade off distinct importance notions rather than acting as an extra hyper-parameter whose optimal value must be searched per dataset.

minor comments (2)

[§3] Notation for Rényi entropy should be defined explicitly (e.g., the exact formula for H_alpha) rather than left implicit from the Shannon case.
[§4] Figure captions and experimental tables should report exact pruning thresholds, number of runs, and standard deviations to allow reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important gaps in experimental validation and controls. We will revise the manuscript to incorporate additional baselines, ablations, and statistical reporting as detailed below. Our responses address each major comment directly.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (experiments): the central claim that the method 'reduced computation while preserving accuracy' and that Rényi 'yields further gains' is unsupported by any reported baselines, pruning ratios, statistical tests, or ablation tables. Without these, it is impossible to determine whether gains exceed those from simply reducing token count or from standard attention-based pruning heuristics.

Authors: We agree that the original experiments section would be strengthened by explicit baselines, pruning ratios, and statistical tests. In the revised version we will add tables reporting exact pruning ratios, comparisons against random pruning and magnitude-based pruning at matched token counts, and accuracy results with standard deviations over multiple random seeds. The current fine-grained recognition results already indicate that entropy pruning maintains higher accuracy than uniform token reduction at equivalent FLOPs, but we accept that these controls are required to make the claim rigorous. revision: partial
Referee: [§3] §3 (method): the assumption that low Shannon/Rényi entropy reliably identifies task-important patches is load-bearing but untested. No controls compare entropy pruning against random pruning, magnitude-based pruning, or the base ViT attention itself; a diffuse-attention patch could still carry a discriminative local feature in fine-grained recognition, undermining the mapping from entropy to importance.

Authors: The mapping from low entropy to importance rests on the observation that concentrated attention reflects the model's selective focus. We will add the requested controls (random pruning, magnitude pruning, and base ViT token retention) in the revised experiments. While a diffuse patch could in principle carry a local feature, our empirical results on fine-grained datasets show no accuracy drop when such patches are removed, supporting the criterion; the added ablations will directly test whether entropy outperforms these alternatives. revision: partial
Referee: [§3.2] §3.2 (Rényi extension): varying the order alpha is presented as adapting to task needs, yet no derivation or ablation shows that different alpha values systematically trade off distinct importance notions rather than acting as an extra hyper-parameter whose optimal value must be searched per dataset.

Authors: We will expand §3.2 with a short derivation showing that increasing alpha in Rényi entropy increasingly weights the maximum attention probability, thereby emphasizing peakiness over average spread. We will also include a new ablation table plotting accuracy-compute curves for alpha in {0.5, 1, 2, 3} across the evaluated datasets, demonstrating that the optimal alpha correlates with dataset characteristics (e.g., higher alpha benefits tasks with sharper attention patterns). This positions alpha as a principled tunable parameter rather than an arbitrary hyper-parameter. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper defines patch importance directly via Shannon and Rényi entropy computed from the transformer's existing attention weight distributions over patches. Low-entropy patches are retained and high-entropy ones pruned, with the Rényi order parameter presented as a tunable extension. This mapping is an explicit, non-fitted criterion applied to model outputs rather than a self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. No equations reduce the claimed result to its inputs by construction, and the approach remains self-contained against external benchmarks such as standard ViT attention pruning heuristics. Experimental gains on fine-grained recognition are reported as empirical outcomes, not tautological consequences of the definition.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on the standard mathematical definition of entropy applied to attention distributions and the domain assumption that attention concentration correlates with patch utility. No new entities are introduced. The Rényi order parameter is a tunable hyperparameter.

free parameters (1)

Rényi order alpha
Controls emphasis on sharp attention peaks; chosen or tuned per task and compute budget.

axioms (2)

standard math Entropy of a probability distribution quantifies its uncertainty or spread
Standard information-theoretic definition applied to per-patch attention weights.
domain assumption Low-entropy attention indicates selective focus on informative patches
Core premise linking attention statistics to patch importance for pruning decisions.

pith-pipeline@v0.9.0 · 5458 in / 1251 out tokens · 32077 ms · 2026-05-13T17:15:27.102471+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a criterion based on the Shannon entropy of the attention distribution. Low-entropy patches... are kept as important, while high-entropy patches... are treated as redundant. We also extend the criterion from Shannon to Rényi entropy
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Rényi attention entropy... order parameter that controls peak emphasis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 4 internal anchors

[1]

In: Proceedings of the 16th Con- ference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Araabi, A., Niculae, V., Monz, C.: Entropy–and distance-regularized attention im- proves low-resource neural machine translation. In: Proceedings of the 16th Con- ference of the Association for Machine Translation in the Americas (Volume 1: Research Track). pp. 140–153 (2024)

work page 2024
[2]

In: Findings of the Association for Computational Linguistics: ACL 2022

Attanasio, G., Nozza, D., Hovy, D., Baralis, E.: Entropy-based attention regular- ization frees unintended bias mitigation from lists. In: Findings of the Association for Computational Linguistics: ACL 2022. pp. 1105–1119 (2022)

work page 2022
[3]

Layer Normalization

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[4]

In: BMVC

Baron-Lis, K., Rottmann, M., MÃ¼tze, A., Honari, S., Fua, P., Salzmann, M.: Attentropy: On the generalization ability of supervised semantic segmenta- tion transformers to new objects in new domains. In: BMVC. BMVA (2024), https://papers.bmvc2024.org/0215.pdf

work page 2024
[5]

In: CVPR

Bergner, B., Lippert, C., Mahendran, A.: Token cropr: Faster vits for quite a few tasks. In: CVPR. pp. 9740–9750 (2025)

work page 2025
[6]

In: ICLR

Bolya, D., Fu, C.Y., Dai, X., Zhang, P., Feichtenhofer, C., Hoffman, J.: Token merging: Your vit but faster. In: ICLR

work page
[7]

In: Proceedings of the 2019 ACL Workshop Black- boxNLP: Analyzing and Interpreting Neural Networks for NLP

Clark, K., Khandelwal, U., Levy, O., Manning, C.D.: What does bert look at? an analysis of bert’s attention. In: Proceedings of the 2019 ACL Workshop Black- boxNLP: Analyzing and Interpreting Neural Networks for NLP. pp. 276–286 (2019)

work page 2019
[8]

In: CVPR

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR. pp. 248–255. Ieee (2009)

work page 2009
[9]

In: ICLR

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR

work page
[10]

In: ECCV

Fayyaz, M., Koohpayegani, S.A., Jafari, F.R., Sengupta, S., Joze, H.R.V., Som- merlade, E., Pirsiavash, H., Gall, J.: Adaptive token sampling for efficient vision transformers. In: ECCV. pp. 396–414. Springer (2022)

work page 2022
[11]

In: Asian Conference on Pattern Recognition

Igaue, Y., Aizawa, H.: Patch pruning strategy based on robust statistical mea- sures of attention weight diversity in vision transformers. In: Asian Conference on Pattern Recognition. pp. 123–133. Springer (2025)

work page 2025
[12]

In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Kobayashi, G., Kuribayashi, T., Yokoi, S., Inui, K.: Attention is not only a weight: Analyzing transformers with vector norms. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 7057–7075 (2020)

work page 2020
[13]

In: ECCV

Kong, Z., Dong, P., Ma, X., Meng, X., Niu, W., Sun, M., Shen, X., Yuan, G., Ren, B., Tang, H., et al.: Spvit: Enabling faster vision transformers via latency-aware soft token pruning. In: ECCV. pp. 620–640. Springer (2022)

work page 2022
[14]

In: ICLR

Liang, Y., Chongjian, G., Tong, Z., Song, Y., Wang, J., Xie, P.: Evit: Expediting vision transformers via token reorganizations. In: ICLR

work page
[15]

In: ICCV

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV. pp. 10012–10022 (2021)

work page 2021
[16]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Fine-Grained Visual Classification of Aircraft

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013) Rényi Attention Entropy for Patch Pruning 15

work page internal anchor Pith review Pith/arXiv arXiv 2013
[18]

In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing

Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing. pp. 722–729 (2008)

work page 2008
[19]

Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? NeurIPS34, 12116–12128 (2021)

work page 2021
[20]

NeurIPS34, 13937–13949 (2021)

Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., Hsieh, C.J.: Dynamicvit: Efficient vision transformers with dynamic token sparsification. NeurIPS34, 13937–13949 (2021)

work page 2021
[21]

In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: con- tributions to the theory of statistics

Rényi, A.: On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: con- tributions to the theory of statistics. vol. 4, pp. 547–562. University of California Press (1961)

work page 1961
[22]

ACM SIGMOBILE mo- bile computing and communications review5(1), 3–55 (2001)

Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE mo- bile computing and communications review5(1), 3–55 (2001)

work page 2001
[23]

In: ICCV

Tang, Q., Zhang, B., Liu, J., Liu, F., Liu, Y.: Dynamic token pruning in plain vision transformers for semantic segmentation. In: ICCV. pp. 777–786 (2023)

work page 2023
[24]

In: ECCV

Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: ECCV. pp. 776–794. Springer (2020)

work page 2020
[25]

In: ICML

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Train- ing data-efficient image transformers &; distillation through attention. In: ICML. vol. 139, pp. 10347–10357 (July 2021)

work page 2021
[26]

NeurIPS30(2017)

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. NeurIPS30(2017)

work page 2017
[27]

In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Voita, E., Talbot, D., Moiseev, F., Sennrich, R., Titov, I.: Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 5797–5808 (2019)

work page 2019
[28]

In: AAAI

Xu, Y., Zhang, Z., Zhang, M., Sheng, K., Li, K., Dong, W., Zhang, L., Xu, C., Sun, X.: Evo-vit: Slow-fast token evolution for dynamic vision transformer. In: AAAI. vol. 36, pp. 2964–2972 (2022)

work page 2022
[29]

In: CVPR

Yin, H., Vahdat, A., Alvarez, J.M., Mallya, A., Kautz, J., Molchanov, P.: A-vit: Adaptive tokens for efficient vision transformer. In: CVPR. pp. 10809–10818 (2022)

work page 2022
[30]

arXiv preprint arXiv:1905.04899 (2019)

Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: Regulariza- tion strategy to train strong classifiers with localizable features. arXiv preprint arXiv:1905.04899 (2019)

work page arXiv 1905
[31]

In: ICML

Zhai, S., Likhomanenko, T., Littwin, E., Busbridge, D., Ramapuram, J., Zhang, Y., Gu, J., Susskind, J.M.: Stabilizing transformer training by preventing attention entropy collapse. In: ICML. pp. 40770–40803. PMLR (2023)

work page 2023
[32]

mixup: Beyond Empirical Risk Minimization

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

In: ICLR (2024)

Zhang, Y., Wei, L., Freris, N.: Synergistic patch pruning for vision transformer: unifying intra-& inter-layer patch importance. In: ICLR (2024)

work page 2024
[34]

arXiv preprint arXiv:2412.16545 (2024)

Zhang, Z., Wang, Y., Huang, X., Fang, T., Zhang, H., Deng, C., Li, S., Yu, D.: Attention entropy is a key factor: An analysis of parallel context encoding with full-attention-based pre-trained language models. arXiv preprint arXiv:2412.16545 (2024)

work page arXiv 2024
[35]

arXiv preprint arXiv:1708.04896 (2017)

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmenta- tion. arXiv preprint arXiv:1708.04896 (2017)

work page arXiv 2017