Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems
Pith reviewed 2026-05-18 00:31 UTC · model grok-4.3
The pith
Reproducing o1-like slow-thinking reasoning works by first imitating long thought traces, then exploring hard problems with multiple rollouts, and iteratively refining the training set on its own outputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By first fine-tuning on distilled long-form thought data to invoke a slow-thinking mode, then generating multiple rollouts on challenging problems to harvest high-quality correct trajectories, and finally using those trajectories to iteratively refine the training dataset, the resulting STILL-2 model reaches competitive accuracy on three hard reasoning benchmarks.
What carries the argument
The STILL-2 framework that sequences imitation of long thought traces, exploration via multiple rollouts, and self-improvement through iterative dataset refinement.
If this is right
- The fine-tuned model learns to produce extended internal reasoning before giving a final answer.
- Multiple rollouts on the same hard question yield an increasing fraction of correct solution paths.
- Each self-improvement round raises performance on the chosen benchmarks.
- The final system matches the accuracy of undisclosed industry reasoning models on the tested tasks.
Where Pith is reading between the lines
- If the self-refinement loop stays stable, the method could be applied to new domains with only a small seed of high-quality traces.
- The approach suggests that closed models may rely on similar internal search-and-filter steps that are now reproducible from public data.
- Monitoring error accumulation across iterations would be a natural next measurement to decide how many rounds are safe.
Load-bearing premise
That generating many rollouts on hard problems will keep producing more correct trajectories and that retraining on the model's own outputs will steadily raise quality without accumulating mistakes.
What would settle it
Run the full imitation-explore-refine loop for three cycles and measure whether accuracy on the hardest benchmark either plateaus below or falls behind the reported industry baselines.
read the original abstract
Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable capabilities in solving complex reasoning tasks. These systems typically engage in an extended thinking process before responding to a query, allowing them to generate more thorough, accurate, and well-reasoned solutions. These systems are primarily developed and maintained by industry, with their core techniques not publicly disclosed. In response, an increasing number of studies from the research community aim to explore the technical foundations underlying these powerful reasoning systems. Building on these prior efforts, this paper presents a reproduction report on implementing o1-like reasoning systems. We introduce an ``imitate, explore, and self-improve'' framework, denoted as \textbf{STILL-2}, as our primary technical approach to train the reasoning model. In the initial phase, we use distilled long-form thought data to fine-tune the reasoning model, enabling it to invoke a slow-thinking mode. The model is then encouraged to explore challenging problems by generating multiple rollouts, which can result in increasingly more high-quality trajectories that lead to correct answers. Furthermore, the model undergoes self-improvement by iteratively refining its training dataset. To verify the effectiveness of this approach, we conduct extensive experiments on three challenging benchmarks. The experimental results demonstrate that our approach achieves competitive performance compared to industry-level reasoning systems on these benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents STILL-2, an 'imitate, explore, and self-improve' framework for reproducing o1-like slow-thinking reasoning systems. It begins with fine-tuning a base model on distilled long-form thought data to induce slow-thinking behavior, proceeds to an exploration phase that generates multiple rollouts on challenging problems to surface additional high-quality correct trajectories, and concludes with iterative self-improvement that refines the training dataset using the model's own outputs. The central claim is that this pipeline yields competitive performance on three challenging benchmarks relative to industry-level reasoning systems.
Significance. If the reported performance gains are robustly supported by detailed, reproducible experiments with proper controls, the work would be significant as one of the first public, end-to-end reproductions of extended chain-of-thought reasoning systems. It offers a concrete, open recipe that combines imitation learning with self-generated trajectories, which could lower barriers for community research on scaling reasoning capabilities beyond standard supervised fine-tuning.
major comments (2)
- [§4 (Experiments)] §4 (Experiments) and associated tables: the assertion of 'competitive performance' is not accompanied by concrete accuracy numbers, baseline comparisons (e.g., against standard CoT or prior open reproductions), error bars, or details on data sources and exclusion rules, leaving the headline claim without visible empirical grounding.
- [Self-improve stage (§3.3)] Self-improve stage description (around §3.3): the iterative refinement of the training dataset assumes that filtering or self-labeling multiple rollouts on hard problems will reliably increase the proportion of correct trajectories without compounding errors, yet no external verifier, held-out accuracy check, or success-rate bound on the rollout filter is provided; this assumption is load-bearing for attributing gains to STILL-2 rather than to the initial imitation phase.
minor comments (1)
- [Abstract and §3] The abstract and method sections use the term 'high-quality trajectories' without an explicit operational definition or filtering criterion, which could be clarified for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our reproduction report. We address each major comment below and outline the revisions we will make to improve the clarity and rigor of the experimental results and the self-improvement methodology.
read point-by-point responses
-
Referee: §4 (Experiments) and associated tables: the assertion of 'competitive performance' is not accompanied by concrete accuracy numbers, baseline comparisons (e.g., against standard CoT or prior open reproductions), error bars, or details on data sources and exclusion rules, leaving the headline claim without visible empirical grounding.
Authors: We appreciate this observation. The manuscript reports concrete accuracy numbers for STILL-2 on the three benchmarks in the experimental tables, along with comparisons to industry systems such as o1-preview. To further ground the claim, we will revise §4 to add explicit baseline results against standard Chain-of-Thought and prior open reproductions, include error bars from repeated runs where computationally feasible, and expand the description of data sources and exclusion rules. revision: yes
-
Referee: Self-improve stage description (around §3.3): the iterative refinement of the training dataset assumes that filtering or self-labeling multiple rollouts on hard problems will reliably increase the proportion of correct trajectories without compounding errors, yet no external verifier, held-out accuracy check, or success-rate bound on the rollout filter is provided; this assumption is load-bearing for attributing gains to STILL-2 rather than to the initial imitation phase.
Authors: The referee correctly notes that the self-improvement stage depends on the quality of filtered trajectories. In §3.3 we select rollouts that produce correct final answers using ground-truth verification on the training problems. We will revise the section to include a held-out accuracy analysis showing the increase in correct trajectories across iterations and report the empirical success rate of the rollout filter to quantify and bound potential error accumulation. revision: yes
Circularity Check
No circularity: empirical reproduction uses external benchmarks and standard self-training without definitional reduction
full rationale
The paper describes an imitate-explore-self-improve pipeline (STILL-2) that begins with fine-tuning on externally distilled long-form thought data, proceeds to multiple rollouts on challenging problems to collect trajectories, and iterates dataset refinement. Performance is measured on three external benchmarks and compared to industry systems. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that would make any claimed result equivalent to its inputs by construction. The central claims rest on experimental outcomes rather than internal redefinitions or load-bearing self-references, rendering the derivation chain self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Distilled long-form thought data from external sources provides a sufficient starting point for invoking slow-thinking behavior in the base model.
- domain assumption Multiple rollouts on challenging problems will yield progressively higher-quality correct trajectories suitable for further training.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce an 'imitate, explore, and self-improve' framework, denoted as STILL-2... use distilled long-form thought data to fine-tune... encouraged to explore challenging problems by generating multiple rollouts... self-improvement by iteratively refining its training dataset.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The experimental results demonstrate that our approach achieves competitive performance compared to industry-level reasoning systems on these benchmarks.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 18 Pith papers
-
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
ConSPO improves RLVR training by aligning rollout scores with generation likelihoods via length-normalized log-probabilities and applying a group-wise InfoNCE contrastive loss with a scheduled margin, outperforming GR...
-
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
RL improves LLM reasoning by sparse policy selection at high-entropy tokens rather than new capability learning, and a minimal RL-free method matches its gains at three orders of magnitude lower cost.
-
Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion
Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reaso...
-
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.
-
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
-
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
RL for LLM reasoning acts as sparse policy selection at high-entropy tokens already present in the base model, enabling ReasonMaxxer—an efficient contrastive method that recovers most RL gains at three orders of magni...
-
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
HEAL mitigates entropy collapse in few-shot RLVR by selectively adding general-domain data and aligning trajectory-level entropy dynamics, matching full-shot performance with 32 target samples.
-
InCoder-32B-Thinking: Industrial Code World Model for Thinking
InCoder-32B-Thinking uses error-feedback synthesized thinking traces and a code world model to reach top open-source scores on general and industrial code benchmarks including 81.3% on LiveCodeBench and 84.0% on CAD-Coder.
-
TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models
TimelineReasoner applies large reasoning models in a Global Cognition plus Detail Exploration loop to produce more accurate, complete, and coherent timelines from news than prior LLM-based methods.
-
CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning
CODA uses rollout-based difficulty signals to drive two gates that penalize verbosity on easy instances and promote deliberation on hard ones, cutting token use over 60% on simple tasks while maintaining accuracy.
-
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
WebThinker equips large reasoning models with autonomous web exploration and interleaved reasoning-drafting via a Deep Web Explorer and RL-based DPO training, yielding gains on GPQA, GAIA, and report-generation benchmarks.
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding...
-
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.
-
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
SCOPE routes LLM on-policy rollouts by correctness into teacher-perplexity-weighted KL for errors and student-perplexity-weighted MLE for successes, with group normalization, yielding 11.42% relative Avg@32 gain on re...
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
A comprehensive review of self-evolving AI agents that improve themselves over time, organized via a framework of inputs, agent system, environment, and optimizers, with domain-specific and safety discussions.
-
From System 1 to System 2: A Survey of Reasoning Large Language Models
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
-
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
Reference graph
Works this paper leans on
-
[1]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models. CoRR, abs/2303.18223, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [2]
-
[3]
A comparative study on reasoning patterns of openai’s o1 model
Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, and Jiaheng Liu. A comparative study on reasoning patterns of openai’s o1 model. CoRR, abs/2410.13639, 2024
-
[4]
Evaluation of openai o1: Opportunities and challenges of AGI
Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu, Chao Cao, Hanqi Jiang, Hanxu Chen, Yiwei Li, Junhao Chen, Huawen Hu, Yihen Liu, Huaqin Zhao, Shaochen Xu, Haixing Dai, Lin Zhao, Ruidong Zhang, Wei Zhao, Zhenyuan Yang, Jingyuan Chen, Peilong Wang, Wei Ruan, Hui Wang, Huan 11 Zhao, J...
- [5]
-
[6]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 2022
work page 2022
-
[7]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, and Zhenru Zhang. Qwen2.5-math technical report: Toward mathematical expert model via self-improvement. CoRR, abs/2409.12122, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. CoRR, abs/2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Enhancing llm reasoning with reward-guided tree search.arXiv preprint arXiv:2411.11694, 2024a
Jinhao Jiang, Zhipeng Chen, Yingqian Min, Jie Chen, Xiaoxue Cheng, Jiapeng Wang, Yiru Tang, Haoxiang Sun, Jia Deng, Wayne Xin Zhao, et al. Technical report: Enhancing llm reasoning with reward-guided tree search. CoRR, abs/2411.11694, 2024
-
[10]
Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning
Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, and Dongzhan Zhou. Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning. CoRR, abs/2410.02884, 2024
-
[11]
o1-coder: an o1 replication for coding
Yuxiang Zhang, Shangxi Wu, Yuqi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, and Jitao Sang. o1-coder: an o1 replication for coding. CoRR, abs/2412.00154, 2024
-
[12]
O1 replication journey: A strategic progress report – part 1
Yiwei Qin, Xuefeng Li, Haoyang Zou, Yixiu Liu, Shijie Xia, Zhen Huang, Yixin Ye, Weizhe Yuan, Hector Liu, Yuanzhi Li, and Pengfei Liu. O1 replication journey: A strategic progress report – part 1. CoRR, 2024
work page 2024
-
[13]
Marco-o1: Towards open reasoning models for open-ended solutions, 2024
Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, and Kaifu Zhang. Marco-o1: Towards open reasoning models for open-ended solutions. CoRR, abs/2411.14405, 2024
-
[14]
Skywork o1 Team. Skywork-o1 open series. https://huggingface.co/Skywork, Novem- ber 2024
work page 2024
-
[15]
Deepseek-r1-lite-preview is now live: unleashing supercharged reasoning power!, November 2024
DeepSeek Team. Deepseek-r1-lite-preview is now live: unleashing supercharged reasoning power!, November 2024
work page 2024
-
[16]
Qwq: Reflect deeply on the boundaries of the unknown, November 2024
Qwen Team. Qwq: Reflect deeply on the boundaries of the unknown, November 2024
work page 2024
-
[17]
Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, Weizhe Yuan, and Pengfei Liu. O1 replication journey–part 2: Surpassing o1-preview through simple distillation, big progress or bitter lesson? arXiv preprint arXiv:2411.16489, 2024
-
[18]
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. In ICLR. OpenReview.net, 2024
work page 2024
-
[19]
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. GPQA: A graduate-level google-proof q&a benchmark. CoRR, abs/2311.12022, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, and Noah D. Good- man. Quiet-star: Language models can teach themselves to think before speaking. CoRR, abs/2403.09629, 2024. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
David Herel and Tomás Mikolov. Thinking tokens for language modeling. CoRR, abs/2405.08644, 2024
- [22]
-
[23]
Tree of thoughts: Deliberate problem solving with large language models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. In NeurIPS, 2023
work page 2023
-
[24]
Jia Li, Edward Beeching, Lewis Tunstall, Ben Lipkin, Roman Soletskyi, Shengyi Huang, Kashif Rasul, Longhui Yu, Albert Q Jiang, Ziju Shen, et al. Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions.Hugging Face repository, 2024
work page 2024
-
[25]
Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, and Mansheej Paul. Perplexed by perplexity: Perplexity-based data pruning with small reference models. CoRR, abs/2405.20541, 2024
-
[26]
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou. Scaling relationship on learning mathematical reasoning with large language models. CoRR, abs/2308.01825, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D. Goodman. Star: Bootstrapping reasoning with reasoning. In NeurIPS, 2022
work page 2022
-
[28]
Qwen2.5: A party of foundation models, September 2024
Qwen Team. Qwen2.5: A party of foundation models, September 2024
work page 2024
-
[29]
Manning, Stefano Ermon, and Chelsea Finn
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In NeurIPS, 2023
work page 2023
-
[30]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
Online iterative reinforce- ment learning from human feedback with general preference model
Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, and Tong Zhang. Online iterative reinforce- ment learning from human feedback with general preference model. CoRR, abs/2402.07314, 2024
-
[32]
Measuring mathematical problem solving with the MATH dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. In Joaquin Vanschoren and Sai-Kit Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 202...
work page 2021
- [33]
-
[34]
Anthropic. Claude 3.5 sonnet, 2024. 13 Problem: Convert the point (0, 3) in rectangular coordinates to polar coordinates. Enter your answer in the form (r, θ), where r > 0 and 0 ≤ θ < 2π. Output: <|begin_of_thought|> Alright, so I’ve got this problem here: I need to convert the point (0,3) from rectangular co- ordinates to polar coordinates. I know that i...
work page 2024
-
[35]
Calculate r: r = p x2 + y2 = p 02 + 32 = √ 9 = 3
-
[36]
Determine θ: Since the point (0, 3) lies on the positive y-axis, the angle θ is π 2 radians. Thus, the polar coordinates are: (3, π 2 ) <|end_of_solution|> Table 5: A sample solution to a problem in MATH-OAI. 14 Problem: Every morning Aya goes for a 9-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of s kilome...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.