Recognition: no theorem link
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition
Pith reviewed 2026-05-12 04:56 UTC · model grok-4.3
The pith
A single shared transformer block reused recurrently can match the accuracy of a full-depth Vision Transformer while using an order of magnitude fewer parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
bViT processes images by repeatedly applying the identical transformer block, preserving the multi-step iterative structure of deep ViTs without dedicating separate parameters to each layer. On ImageNet-1K the 12-step bViT-B attains accuracy comparable to standard ViT-B under matched training recipe and computational budget while using an order of magnitude fewer parameters. Recurrent accuracy rises with representation width, interpreted as implicit depth multiplexing in which the shared block expresses different effective transformations as the hidden state evolves. Mechanistic probes of activations, attention maps, and step-wise pruning confirm that the block alters its behavior across the
What carries the argument
Single-block recurrence in which one transformer block is applied repeatedly to an evolving hidden state, allowing step-dependent computations without layer-specific weights.
If this is right
- Wider representation dimensions let recurrent models recover a larger fraction of standard ViT performance.
- bViT achieves competitive transfer accuracy on downstream image tasks.
- The architecture supports parameter-efficient fine-tuning by updating only the shared block.
- Analyses of activations and attention show the shared block changes its effective computation across recurrent steps rather than repeating identical operations.
Where Pith is reading between the lines
- Recurrent reuse could lower peak memory during training by keeping only one block in GPU memory at a time.
- The width-recurrence tradeoff may extend to other transformer-based sequence models beyond vision.
- Step-dependent pruning results suggest that future work could learn or schedule different effective depths per image or task.
Load-bearing premise
The training recipe and total computational budget are truly equivalent between the recurrent bViT and the standard stacked ViT, so that any accuracy match arises from the recurrence itself rather than uncontrolled differences in optimization or implementation.
What would settle it
Train a non-recurrent single-block ViT (one block applied once) with the same total parameter count and compute as the 12-step bViT-B and measure whether its ImageNet accuracy falls substantially below the recurrent version.
Figures
read the original abstract
Vision Transformers (ViTs) are built by stacking independently parameterized blocks, but it remains unclear how much of this depth requires layer specific transformations and how much can be realized through recurrent computation. We study this question with bViT, a single-block recurrent ViT in which one transformer block is applied repeatedly to process an image. This architecture preserves the iterative structure of a deep ViT while removing layer specific block parameterization, providing a controlled setting for studying recurrence in vision. On ImageNet-1K, a 12-step bViT-B achieves accuracy comparable to standard ViT-B under the same training recipe and computational budget, while using an order of magnitude fewer parameters. We observe that recurrent performance improves with representation width, with wider bViTs recovering much more of the performance of standard ViTs than narrow variants. We interpret this behavior as implicit depth multiplexing, where a shared block expresses multiple step-dependent computations through the evolving hidden state. Beyond ImageNet classification, bViT transfers competitively to downstream tasks and enables parameter-efficient fine-tuning. Mechanistic analyses of activations, attention and step-specific pruning show that the shared block changes its effective behavior across recurrent steps rather than simply repeating the same computation. Our results suggest that a large fraction of ViT depth can be implemented through recurrent reuse, provided that the representation space is sufficiently wide.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces bViT, a Vision Transformer that replaces the standard stack of independently parameterized blocks with a single shared transformer block applied recurrently for a fixed number of steps. The central empirical claim is that a 12-step bViT-B achieves ImageNet-1K accuracy comparable to a standard ViT-B under the same training recipe and computational budget while using an order of magnitude fewer parameters. The authors support this with mechanistic analyses of activations, attention patterns, and step-specific pruning that indicate the shared block changes its effective computation across steps rather than repeating identical operations. They further report that wider representations recover more of the standard ViT performance, interpret this as implicit depth multiplexing, and show competitive transfer to downstream tasks plus benefits for parameter-efficient fine-tuning.
Significance. If the matched-budget and matched-recipe claims are substantiated, the result would be significant for understanding the role of depth versus recurrence in Vision Transformers and for designing parameter-efficient vision models. The controlled single-block setup isolates recurrence effects more cleanly than prior recurrent transformer variants, and the width-scaling observation plus mechanistic analyses provide concrete evidence that a shared block can express step-dependent transformations through evolving hidden states. The parameter reduction and downstream transfer results are concrete strengths that could influence efficient architecture design.
major comments (3)
- [Section 4] Section 4 (Experiments) and Table 1: The claim that computational budgets are matched between 12-step bViT-B and standard ViT-B is load-bearing for attributing accuracy parity to recurrence, yet the manuscript provides no per-step FLOPs breakdown, total training FLOPs, peak memory, or wall-clock time comparison. Recurrent unrolling can alter caching, mixed-precision behavior, and gradient flow relative to independent blocks, so explicit verification is required to rule out incidental optimization differences.
- [Section 4.2] Section 4.2 (Ablations and controls): No ablation is presented against a non-recurrent weight-tied baseline (e.g., single application of the shared block or fixed hidden-state reuse) or against an independently parameterized model constrained to the same total parameter count. Without these controls it is difficult to isolate the performance contribution of recurrence itself from weight sharing or other architectural choices.
- [Results] Results section and Table 1: Exact top-1 accuracies, standard deviations across multiple random seeds, and the precise ViT-B baseline accuracy under the identical recipe are not reported; only the qualitative statement “comparable” appears. This prevents quantitative assessment of whether the observed parity is robust or within the range of training noise.
minor comments (2)
- [Figure 3] Figure 3 (attention visualizations): Step indices and any quantitative measures of attention change across steps should be annotated directly on the figure panels for immediate readability.
- [Section 3] Section 3 (Model definition): The recurrence update rule and hidden-state notation would benefit from an explicit equation (e.g., h_{t+1} = Block(h_t, x)) to make the unrolling mechanics unambiguous.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects for strengthening the empirical claims and controls in our work on bViT. We address each major comment below, committing to revisions that enhance clarity and rigor while preserving the core contributions on recurrence in Vision Transformers.
read point-by-point responses
-
Referee: [Section 4] Section 4 (Experiments) and Table 1: The claim that computational budgets are matched between 12-step bViT-B and standard ViT-B is load-bearing for attributing accuracy parity to recurrence, yet the manuscript provides no per-step FLOPs breakdown, total training FLOPs, peak memory, or wall-clock time comparison. Recurrent unrolling can alter caching, mixed-precision behavior, and gradient flow relative to independent blocks, so explicit verification is required to rule out incidental optimization differences.
Authors: We agree that detailed verification of matched budgets is essential to isolate the effects of recurrence. Although the training recipe was designed to equate the number of block applications and overall compute (with bViT using the same operations per step as a standard block), we did not provide an explicit breakdown. In the revised manuscript, we will add a per-step FLOPs analysis, total training FLOPs, peak memory usage during training and inference, and wall-clock time comparisons on identical hardware to confirm equivalence and address potential differences in caching or gradient flow. revision: yes
-
Referee: [Section 4.2] Section 4.2 (Ablations and controls): No ablation is presented against a non-recurrent weight-tied baseline (e.g., single application of the shared block or fixed hidden-state reuse) or against an independently parameterized model constrained to the same total parameter count. Without these controls it is difficult to isolate the performance contribution of recurrence itself from weight sharing or other architectural choices.
Authors: This is a fair point for isolating recurrence effects. We will add a 1-step bViT baseline (single application of the shared block) to Table 1 and the ablations section to quantify the benefit of multiple recurrent steps. For fixed hidden-state reuse, we can include a control where the state is not updated after the first step. Regarding an independently parameterized model with the same total parameter count, this equates to a 1-block standard ViT, which we will explicitly compare as a parameter-matched shallow baseline. Our existing mechanistic analyses (evolving activations, attention patterns, and step-specific pruning) already demonstrate that the shared block performs distinct computations across steps rather than repeating identical operations, providing evidence beyond weight sharing alone; we will expand the discussion to tie these directly to the new controls. revision: partial
-
Referee: [Results] Results section and Table 1: Exact top-1 accuracies, standard deviations across multiple random seeds, and the precise ViT-B baseline accuracy under the identical recipe are not reported; only the qualitative statement “comparable” appears. This prevents quantitative assessment of whether the observed parity is robust or within the range of training noise.
Authors: We acknowledge that explicit numerical reporting is necessary for assessing robustness. Although Table 1 in the manuscript contains the accuracy values, the main text uses the term “comparable” for brevity. In the revision, we will explicitly report the exact top-1 accuracies for the 12-step bViT-B and the standard ViT-B baseline under the identical recipe, along with standard deviations computed across at least three random seeds to demonstrate that the parity holds within training variability. revision: yes
Circularity Check
No circularity: purely empirical architecture comparison with no derivations or self-referential predictions
full rationale
The paper reports experimental results on ImageNet-1K accuracy, transfer learning, and mechanistic analyses of activations/attention for a recurrent single-block ViT versus a standard stacked ViT. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim (accuracy parity under matched budget) is an empirical observation, not a reduction to inputs by construction. Self-citations, if present in the full manuscript, are not invoked to justify uniqueness theorems or ansatzes that would force the result. This is the expected non-finding for an architecture-ablation study.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard deep learning assumptions on optimization, data augmentation, and ImageNet evaluation protocols hold for both bViT and baseline ViT models.
Reference graph
Works this paper leans on
-
[1]
Evolutionary optimization of model merging recipes.Nature Machine Intelligence, 7(2):195–204, 2025
Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, and David Ha. Evolutionary optimization of model merging recipes.Nature Machine Intelligence, 7(2):195–204, 2025
work page 2025
-
[2]
Emerging properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021
work page 2021
- [3]
-
[4]
Vision transformers need registers
Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers. InInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[5]
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Universal transformers.Proceedings of the International Conference on Learning Representations (ICLR), 2019
work page 2019
-
[6]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
-
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.Proceedings of the International Conference on Learning Representations (ICLR), 2020
work page 2020
-
[8]
Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, and Philip Torr. Large- scale unsupervised semantic segmentation.IEEE transactions on pattern analysis and machine intelligence, 45(6):7457–7476, 2022
work page 2022
-
[9]
Revealing the utilized rank of subspaces of learning in neural networks
Isha Garg, Christian Koguchi, Eshan Verma, and Daniel Ulbricht. Revealing the utilized rank of subspaces of learning in neural networks. InProceedings of the AAAI Symposium Series, volume 5, pages 151–158, 2025
work page 2025
-
[10]
What do vision transformers learn? a visual exploration.ArXiv e-print, 2022
Amin Ghiasi, Hamid Kazemi, Eitan Borgnia, Steven Reich, Manli Shu, Micah Goldblum, Andrew Gordon Wilson, and Tom Goldstein. What do vision transformers learn? a visual exploration.ArXiv e-print, 2022
work page 2022
-
[11]
Looped transformers as programmable computers
Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D Lee, and Dimitris Papail- iopoulos. Looped transformers as programmable computers. InInternational Conference on Machine Learning, pages 11398–11442. PMLR, 2023
work page 2023
-
[12]
Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, and Jun Yu. A token is worth over 1,000 tokens: Efficient knowledge distillation through low-rank clone.Advances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[13]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022
work page 2022
-
[14]
Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
work page 2022
-
[15]
Block-recurrent dynamics in vision transformers.arXiv preprint arXiv:2512.19941, 2025
Mozes Jacobs, Thomas Fel, Richard Hakim, Alessandra Brondetta, Demba Ba, and T Andy Keller. Block-recurrent dynamics in vision transformers.arXiv preprint arXiv:2512.19941, 2025
-
[16]
Less is More: Recursive Reasoning with Tiny Networks
Alexia Jolicoeur-Martineau. Less is more: Recursive reasoning with tiny networks.arXiv preprint arXiv:2510.04871, 2025
work page internal anchor Pith review arXiv 2025
-
[17]
3d object representations for fine-grained categorization
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. InProceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013
work page 2013
-
[18]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009
work page 2009
-
[19]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations.Proceedings of the International Conference on Learning Representations (ICLR), 2020. 10
work page 2020
-
[20]
Uni-lora: One vector is all you need
Kaiyang Li, Shaobo Han, Qing Su, Wei Li, Zhipeng Cai, and Shihao Ji. Uni-lora: One vector is all you need. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2017
work page 2017
-
[21]
Simple recursive model: Simplified, single-state reasoning with skip connections
Qianli Liao and Tomaso Poggio. Simple recursive model: Simplified, single-state reasoning with skip connections. 2026
work page 2026
-
[22]
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.Proceedings of the International Conference on Learning Representations (ICLR), 2019
work page 2019
-
[23]
Recurrent vision transformer for solving visual reasoning problems
Nicola Messina, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, and Fabrizio Falchi. Recurrent vision transformer for solving visual reasoning problems. InInternational Conference on Image Analysis and Processing, pages 50–61. Springer, 2022
work page 2022
-
[24]
Automated flower classification over a large number of classes
Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008
work page 2008
-
[25]
Anantha Padmanaban Krishna Kumar. Parameter reduction improves vision transformers: A comparative study of sharing and width reduction.arXiv e-prints, pages arXiv–2512, 2025
work page 2025
-
[26]
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012
work page 2012
-
[27]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-perfo...
work page 2019
-
[28]
Learning transferable visual models from natural language supervision.ArXiv e-print, 2021
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.ArXiv e-print, 2021
work page 2021
-
[29]
Tied-lora: Enhancing parameter efficiency of lora with weight tying
Adithya Renduchintala, Tugrul Konuk, and Oleksii Kuchaiev. Tied-lora: Enhancing parameter efficiency of lora with weight tying. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8694–8705, 2024
work page 2024
-
[30]
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, and Sashank J Reddi. Reasoning with latent thoughts: On the power of looped transformers.Proceedings of the International Conference on Learning Representations (ICLR), 2025
work page 2025
-
[31]
Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks
Avi Schwarzschild, Eitan Borgnia, Arjun Gupta, Furong Huang, Uzi Vishkin, Micah Goldblum, and Tom Goldstein. Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks. Advances in Neural Information Processing Systems, 34:6695–6706, 2021
work page 2021
-
[32]
Loopvit: Scaling visual arc with looped transformers.iclr, 2026
Wen-Jie Shu, Xuerui Qiu, Rui-Jie Zhu, Harold Haodong Chen, Yexin Liu, and Harry Yang. Loopvit: Scaling visual arc with looped transformers.iclr, 2026
work page 2026
-
[33]
Grzegorz Stefanski, Alberto Presta, and Michal Byra. Routing the lottery: Adaptive subnetworks for heterogeneous data.arXiv preprint arXiv:2601.22141, 2026
-
[34]
Transformer layers as painters
Qi Sun, Marc Pickett, Aakash Kumar Nain, and Llion Jones. Transformer layers as painters. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 25219–25227, 2025
work page 2025
-
[35]
Generalized linear mode connectivity for transformers
Alexander Theus, Alessandro Cabodi, Sotiris Anagnostidis, Antonio Orvieto, Sidak Pal Singh, and Valentina Boeva. Generalized linear mode connectivity for transformers. 2025
work page 2025
-
[36]
Training data-efficient image transformers & distillation through attention
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. InInternational conference on machine learning, pages 10347–10357. PMLR, 2021
work page 2021
-
[37]
Attention is all you need.Advances in Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[38]
Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori. Hierarchical reasoning model.arXiv preprint arXiv:2506.21734, 2025. 11
work page internal anchor Pith review arXiv 2025
-
[39]
Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization.Journal of Machine Learning Research, 22(201):1–73, 2021
work page 2021
-
[40]
Maciej Wołczyk, Bartosz Wójcik, Klaudia Bałazy, Igor T Podolak, Jacek Tabor, Marek´Smieja, and Tomasz Trzcinski. Zero time waste: Recycling predictions in early exit neural networks.Advances in Neural Information Processing Systems, 34:2516–2528, 2021
work page 2021
-
[41]
Peng Xu, Xiatian Zhu, and David A Clifton. Multimodal learning with transformers: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10):12113–12132, 2023
work page 2023
-
[42]
Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos. Looped transformers are better at learning learning algorithms.Proceedings of the International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[43]
Abbas Zeitoun, Lucas Torroba-Hennigen, and Yoon Kim. Hyperloop transformers.arXiv preprint arXiv:2604.21254, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[44]
Root mean square layer normalization
Biao Zhang and Rico Sennrich. Root mean square layer normalization. InAdvances in Neural Information Processing Systems, volume 32, 2019
work page 2019
-
[45]
Top-down neural attention by excitation backprop
Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. Top-down neural attention by excitation backprop. InEuropean Conference on Computer Vision, pages 543–559. Springer, 2016
work page 2016
-
[46]
Minivit: Compressing vision transformers with weight multiplexing
Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, and Lu Yuan. Minivit: Compressing vision transformers with weight multiplexing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12145–12154, 2022
work page 2022
-
[47]
Godec: Randomized low-rank & sparse matrix decomposition in noisy case
Tianyi Zhou and Dacheng Tao. Godec: Randomized low-rank & sparse matrix decomposition in noisy case. InProceedings of the 28th International Conference on Machine Learning, ICML 2011, 2011. 12 Appendix A A capacity model and rank analysis A.1 A capacity model for recurrent depth sharing In the main manuscript, we interpret the width dependence of bViT as ...
work page 2011
-
[48]
Train the full model forEepochs with all parameters unfrozen using current masks
-
[49]
,12}: • Initialize a graft encoderθ graft fromθ init
For each loop iterationt∈ {1, . . . ,12}: • Initialize a graft encoderθ graft fromθ init. • Apply the current masks. • Freeze the main encoder and train only the graft encoder for G epochs, using it exclusively at loop steptinstead of main encoder during forward passes. • Compute weight magnitudes over active parameters (whereM t = 1). • Identify the smal...
-
[50]
Update masks by setting selected weights to zero separately for each loop
-
[51]
Rewind weights toθ init while preserving updated masks. In our experiments, we used P= 30 pruning steps, E= 200 full model training epochs, G= 100 graft training epochs, andr % = 20%pruning threshold. G.4 Analysis metrics To analyze the learned pathways, we compute overlap and specialization metrics based on extracted binary masks. Jaccard similarity.We m...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.