StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems
Pith reviewed 2026-06-28 10:10 UTC · model grok-4.3
The pith
StepFinder attributes root cause steps in multi-agent failures by encoding logs into temporal sequences and applying lightweight modeling instead of full LLM reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
StepFinder encodes execution logs into temporal semantic sequences using LLMs only during feature construction, then applies a parameter-efficient combination of temporal modeling and attention modules to capture sequential evolution and cross-step dependencies, and finally refines step-level error scores through multi-scale differences and position bias to identify the root cause step.
What carries the argument
Temporal semantic sequences produced by LLM encoding of execution logs, processed by temporal modeling plus attention modules and refined by multi-scale differences and position bias.
If this is right
- Step-level failure attribution becomes accurate enough to guide targeted fixes in multi-agent workflows.
- Inference cost drops sharply because only the initial encoding uses an LLM and no text is generated at runtime.
- The same pipeline can be applied to any multi-step agent trajectory without retraining large models.
- Position bias and multi-scale refinement together reduce false positives from early or late steps in long sequences.
Where Pith is reading between the lines
- The separation of encoding from scoring suggests similar hybrid designs could speed up other log-analysis tasks that currently call LLMs repeatedly.
- If the temporal sequences prove robust across domains, the approach could be tested on non-LLM agent systems that still produce timestamped execution traces.
- Extending the refinement step to include agent-interaction graphs might further improve attribution when failures involve coordination rather than single steps.
Load-bearing premise
Encoding execution logs into temporal semantic sequences with LLMs preserves enough information for the later modules to locate the true root cause without the noise that affects raw logs.
What would settle it
A controlled test on the Who&When benchmark in which the LLM encoding step drops the critical signal about the actual failing step, causing StepFinder to return the wrong root cause while raw-log LLM methods still succeed.
Figures
read the original abstract
LLM-based multi-agent systems exhibit remarkable collaborative capabilities in complex multi-step tasks. However, these systems are highly sensitive to single-step execution errors that can propagate through agent interactions and lead to cascading failures. To understand the causes of failure and improve system reliability, failure attribution has been introduced as a task that aims to automatically identify the root cause step responsible for a failure. Existing failure attribution methods mainly rely on LLMs to reason over original execution trajectories, which not only incur high inference costs and latency, but also suffer from interference caused by redundant and noisy execution logs, causing LLMs to struggle in accurately identifying the true root cause step. To address this, we propose StepFinder, a lightweight failure attribution framework. We use LLMs solely during the feature construction phase to encode execution logs into temporal semantic sequences. Subsequently, a parameter-efficient combination of temporal modeling and attention modules is applied to capture the sequential evolution and cross-step dependencies of the trajectories. Finally, the step-level error score is refined through multi-scale differences and position bias, enabling precise root cause identification. Experimental results on the Who&When benchmark demonstrate that StepFinder outperforms LLM-based methods in step-level failure attribution while achieving substantially higher inference efficiency, reducing inference time by 79% compared with the fastest LLM-based method, with no text generation overhead. Our code is available at https://github.com/taiyu-zhu/StepFinder.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes StepFinder, a lightweight failure attribution framework for LLM-based multi-agent systems. LLMs are used solely in the feature construction phase to encode execution logs into temporal semantic sequences; a parameter-efficient combination of temporal modeling and attention modules then captures sequential evolution and cross-step dependencies, followed by multi-scale difference and position-bias refinement to produce step-level error scores. On the Who&When benchmark the method is reported to outperform direct LLM-based attribution while reducing inference time by 79% relative to the fastest LLM baseline and incurring no text-generation overhead.
Significance. If the empirical claims are substantiated, the work offers a practical route to accurate, low-latency failure attribution that avoids the cost and noise sensitivity of end-to-end LLM reasoning over raw trajectories. The public release of code is a clear reproducibility asset.
major comments (2)
- [Abstract] Abstract: the central performance claim (outperformance plus 79% time reduction) is presented without any description of the LLM baselines, statistical significance tests, error bars, train/test splits, or ablation controls; these omissions make the reported gains impossible to evaluate from the given information.
- [§3] §3 (feature construction): the method's accuracy advantage rests on the untested premise that LLM-generated temporal semantic sequences retain sufficient causal signal and do not introduce new distortions (over-generalization or hallucinated relations) relative to raw logs; no ablation comparing encoded sequences against raw-log inputs or alternative encodings is described, leaving the load-bearing assumption unsupported.
minor comments (1)
- [Abstract] The benchmark name 'Who&When' should be accompanied by a citation or brief description of its construction and size.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, clarifying the experimental details and the rationale for our feature construction approach while noting where revisions will strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim (outperformance plus 79% time reduction) is presented without any description of the LLM baselines, statistical significance tests, error bars, train/test splits, or ablation controls; these omissions make the reported gains impossible to evaluate from the given information.
Authors: We agree that the abstract's brevity omits key experimental context. The LLM baselines (including model variants and prompting strategies) are fully specified in Section 4.2, statistical significance tests with error bars appear in the results tables of Section 4.3, train/test splits are detailed in Section 4.1, and ablation controls are reported in Section 5. To improve evaluability directly from the abstract, we will revise it to briefly identify the primary LLM baselines and note that full methodological details, including significance testing and splits, are provided in the experimental sections. revision: yes
-
Referee: [§3] §3 (feature construction): the method's accuracy advantage rests on the untested premise that LLM-generated temporal semantic sequences retain sufficient causal signal and do not introduce new distortions (over-generalization or hallucinated relations) relative to raw logs; no ablation comparing encoded sequences against raw-log inputs or alternative encodings is described, leaving the load-bearing assumption unsupported.
Authors: Section 3 motivates the temporal semantic encoding as a means to extract high-level causal relations from noisy, redundant logs while preserving sequential structure, with the subsequent temporal modeling and attention modules operating on these sequences. Although a direct ablation against raw-log inputs or alternative encodings is not reported, the end-to-end results on the Who&When benchmark demonstrate that StepFinder outperforms direct LLM attribution methods that reason over raw trajectories, providing indirect support that the encoding retains necessary signal without prohibitive distortion. We will revise Section 3 to include an expanded discussion of this design assumption, its potential limitations, and the indirect evidence from the main experiments. revision: partial
Circularity Check
No circularity: modular pipeline with experimental validation
full rationale
The paper presents StepFinder as an engineering pipeline (LLM encoding of logs into temporal semantic sequences, followed by temporal modeling + attention + multi-scale refinement) whose performance claims rest on benchmark experiments rather than any derivation, equation, or fitted parameter that reduces to its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way; the method is described as a practical combination of existing components without tautological reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normaliza- tion.arXiv preprint arXiv:1607.06450(2016)
Pith/arXiv arXiv 2016
-
[2]
Adi Banerjee, Anirudh Nair, and Tarik Borogovac. 2025. Where did it all go wrong? A hierarchical look into multi-agent error attribution.arXiv preprint arXiv:2510.04886(2025)
arXiv 2025
-
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901
2020
-
[4]
Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ram- chandran, et al . 2025. Why do multi-agent llm systems fail?arXiv preprint arXiv:2503.13657(2025)
Pith/arXiv arXiv 2025
-
[5]
Darshan Deshpande, Varun Gangal, Hersh Mehta, Jitin Krishnan, Anand Kan- nappan, and Rebecca Qian. 2025. TRAIL: Trace Reasoning and Agentic Issue Localization.arXiv preprint arXiv:2505.08638(2025)
arXiv 2025
-
[6]
Wei Du, Shifei Ding, Lili Guo, Jian Zhang, and Ling Ding. 2024. Expressive multi-agent communication via identity-aware learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17354–17361
2024
-
[7]
Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, and Amir Gholami. 2025. Plan-and-act: Im- proving planning of agents for long-horizon tasks.arXiv preprint arXiv:2503.09572 (2025)
Pith/arXiv arXiv 2025
-
[8]
Mingyan Gao, Yanzi Li, Banruo Liu, Yifan Yu, Phillip Wang, Ching-Yu Lin, and Fan Lai. 2025. Single-agent or Multi-agent Systems? Why Not Both?arXiv preprint arXiv:2505.18286(2025)
arXiv 2025
-
[9]
Yu Ge, Linna Xie, Zhong Li, Yu Pei, and Tian Zhang. 2025. Who is introducing the failure? automatically attributing failures of multi-agent systems via spectrum analysis.arXiv preprint arXiv:2509.13782(2025)
arXiv 2025
-
[10]
Alireza Ghafarollahi and Markus J Buehler. 2025. SciAgents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning.Advanced Materials37, 22 (2025), 2413523
2025
-
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780
1997
-
[12]
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al
-
[13]
InThe twelfth international conference on learning representations
MetaGPT: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations
-
[14]
Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model context protocol (mcp): Landscape, security threats, and future research directions.arXiv preprint arXiv:2503.23278(2025)
Pith/arXiv arXiv 2025
-
[15]
Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Michael R Lyu, and Maarten Sap. 2024. On the resilience of llm-based multi-agent collaboration with faulty agents.arXiv preprint arXiv:2408.00989(2024)
arXiv 2024
-
[16]
Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, and Murat Kocaoglu. 2022. Root cause analysis of failures in microservices through causal discovery.Advances in Neural Information Processing Systems35 (2022), 31158–31170
2022
-
[17]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770(2023)
Pith/arXiv arXiv 2023
-
[18]
Fanqi Kong, Ruijie Zhang, Huaxiao Yin, Guibin Zhang, Xiaofei Zhang, Ziang Chen, Zhaowei Zhang, Xiaoyuan Zhang, Song-Chun Zhu, and Xue Feng. 2025. Aegis: Automated Error Generation and Attribution for Multi-Agent Systems. arXiv preprint arXiv:2509.14295(2025)
Pith/arXiv arXiv 2025
-
[19]
Hui Yi Leong, Yuheng Li, Yuqing Wu, Wenwen Ouyang, Wei Zhu, Jiechao Gao, and Wei Han. 2025. Amas: Adaptively determining communication topology for llm-based multi-agent system. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2061–2070
2025
-
[20]
Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. Camel: Communicative agents for" mind" exploration of large language model society.Advances in Neural Information Processing Systems36 (2023), 51991–52008
2023
-
[21]
Shiyuan Li, Yixin Liu, Qingsong Wen, Chengqi Zhang, and Shirui Pan. 2025. Assemble your crew: Automatic multi-agent communication topology design via autoregressive graph generation.arXiv preprint arXiv:2507.18224(2025)
arXiv 2025
-
[22]
Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, and Wen-Chih Peng. 2024. Root cause analysis in microservice using neural granger causal discovery. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 206–213
2024
-
[23]
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics12 (2024), 157–173
2024
-
[24]
Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. 2023. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization.arXiv preprint arXiv:2310.02170(2023)
Pith/arXiv arXiv 2023
-
[25]
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha
-
[26]
arXiv preprint arXiv:2408.06292(2024)
The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)
Pith/arXiv arXiv 2024
-
[27]
Guoqing Ma, Jia Zhu, Hanghui Guo, Weijie Shi, Jiawei Shen, Jingjiang Liu, and Yidan Liang. 2025. Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference.arXiv preprint arXiv:2509.08682(2025)
arXiv 2025
-
[28]
Joshua Owotogbe. 2025. Assessing and Enhancing the Robustness of LLM- based Multi-Agent Systems Through Chaos Engineering. In2025 IEEE/ACM 4th International Conference on AI Engineering–Software Engineering for AI (CAIN). IEEE, 250–252
2025
-
[29]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22
2023
-
[30]
Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. 2024. Chatdev: Communicative agents for software development. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15174–15186
2024
-
[31]
Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, and Yuan Qi. 2024. Towards collaborative intelligence: Propagat- ing intentions and reasoning for multi-agent coordination with large language models.arXiv preprint arXiv:2407.12532(2024)
arXiv 2024
-
[32]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems36 (2023), 68539–68551
2023
-
[33]
Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04227(2025)
Pith/arXiv arXiv 2025
-
[34]
Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, and Xin Wang. 2025. Understanding the Information Propagation Effects of Com- munication Topologies in LLM-based Multi-Agent Systems.arXiv preprint arXiv:2505.23352(2025)
arXiv 2025
-
[35]
Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kamb- hampati. 2023. On the planning abilities of large language models-a critical investigation.Advances in Neural Information Processing Systems36 (2023), 75993–76005
2023
-
[36]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291(2023)
Pith/arXiv arXiv 2023
-
[37]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171(2022)
Pith/arXiv arXiv 2022
-
[38]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837
2022
-
[39]
Alva West, Yixuan Weng, Minjun Zhu, Zhen Lin, Zhiyuan Ning, and Yue Zhang
-
[40]
Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems.arXiv preprint arXiv:2509.10401(2025)
arXiv 2025
-
[41]
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst Conference on Language Modeling
2024
-
[42]
Yingxuan Yang, Huacan Chai, Shuai Shao, Yuanyi Song, Siyuan Qi, Renting Rui, and Weinan Zhang. 2025. Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems.arXiv preprint arXiv:2504.00587(2025)
arXiv 2025
-
[43]
Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, and Yuhui Shi. 2024. Embodied multi-modal agent trained by an llm from a parallel textworld. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 26275–26285
2024
-
[44]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems36 (2023), 11809–11822
2023
-
[45]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations
2022
-
[46]
Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, and Yang Zhang. 2025. Breaking agents: Compromising autonomous llm agents through malfunction amplification. InProceedings of the 2025 Confer- ence on Empirical Methods in Natural Language Processing. 34952–34964
2025
-
[47]
Guibin Zhang, Luyang Niu, Junfeng Fang, Kun Wang, Lei Bai, and Xiang Wang
-
[48]
Multi-agent architecture search via agentic supernet.arXiv preprint StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea arXiv:2502.04180(2025)
arXiv 2026
-
[49]
Guibin Zhang, Junhao Wang, Junjie Chen, Wangchunshu Zhou, Kun Wang, and Shuicheng Yan. 2025. AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?arXiv preprint arXiv:2509.03312(2025)
arXiv 2025
-
[50]
Jieyu Zhang, Ranjay Krishna, Ahmed H Awadallah, and Chi Wang. 2023. Ecoas- sistant: Using llm assistant more affordably and accurately.arXiv preprint arXiv:2310.03046(2023)
arXiv 2023
-
[51]
Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, et al . 2025. Which agent causes task failures and when? on automated failure attribution of llm multi-agent systems.arXiv preprint arXiv:2505.00212(2025)
arXiv 2025
-
[52]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. 2025. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025)
Pith/arXiv arXiv 2025
-
[53]
Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, and Xuelong Li. 2025. Towards efficient llm grounding for embodied multi-agent collaboration. InFindings of the Association for Computational Linguistics: ACL 2025. 1663–1699
2025
-
[54]
Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, and Yu Li. 2023. Progressive-hint prompting improves reasoning in large language models.arXiv preprint arXiv:2304.09797(2023)
arXiv 2023
-
[55]
Han Zhou, Xingchen Wan, Ruoxi Sun, Hamid Palangi, Shariq Iqbal, Ivan Vulić, Anna Korhonen, and Sercan Ö Arık. 2025. Multi-agent design: Optimizing agents with better prompts and topologies.arXiv preprint arXiv:2502.02533(2025)
arXiv 2025
-
[56]
Chenyang Zhu, Spencer Hong, Jingyu Wu, Kushal Chawla, Charlotte Tang, Youb- ing Yin, Nathan Wolfe, Erin Babinsky, and Daben Liu. 2025. RAFFLES: Reasoning- based Attribution of Faults for LLM Systems.arXiv preprint arXiv:2509.06822 (2025)
arXiv 2025
-
[57]
Kunlun Zhu, Zijia Liu, Bingxuan Li, Muxin Tian, Yingxuan Yang, Jiaxun Zhang, Pengrui Han, Qipeng Xie, Fuyang Cui, Weijia Zhang, et al . 2025. Where llm agents fail and how they can learn from failures.arXiv preprint arXiv:2509.25370 (2025)
arXiv 2025
-
[58]
Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoor- thi, Yuandong Tian, et al. 2024. Agent-as-a-judge: Evaluate agents with agents. arXiv preprint arXiv:2410.10934(2024). A Pseudocode of StepFinder Algorithm 1StepFinder Training Input:MAS failure logsT={𝜏 1, 𝜏2, . ....
arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.