Recognition: no theorem link
DeonticBench: A Benchmark for Reasoning over Rules
Pith reviewed 2026-05-10 20:18 UTC · model grok-4.3
The pith
Large language models reach only 44 percent accuracy on complex deontic reasoning tasks drawn from real taxes, policies, and laws.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DEONTICBENCH supplies 6,232 tasks across four domains and supports both free-form language reasoning and an optional workflow that translates rules and facts into executable Prolog, complete with reference programs; across frontier LLMs and coding models the best hard-subset scores are 44.4 percent on SARA Numeric and 46.6 macro-F1 on Housing, while supervised fine-tuning and reinforcement learning raise Prolog generation quality without producing reliable solutions to the tasks.
What carries the argument
The dual workflow that lets models either reason directly in language or translate statutes plus case facts into executable Prolog programs that return formal interpretations and explicit traces.
If this is right
- Frontier models remain unreliable for high-stakes reasoning about obligations and permissions in legal and policy domains.
- Training on symbolic program generation improves output quality yet does not close the gap to reliable task performance.
- Benchmarks that combine long-context language input with formal execution traces are required to measure progress on rule-based reasoning.
Where Pith is reading between the lines
- Tighter integration of language models with external solvers could be tested directly on the same tasks to measure whether performance jumps.
- The persistent failure after training suggests that deontic reasoning may depend on mechanisms for tracking exceptions and priorities that current pre-training does not supply.
- The benchmark could serve as a training signal for future models that aim to combine natural-language understanding with formal deduction.
Load-bearing premise
The chosen statutes, case facts, and Prolog translations faithfully represent the central difficulties of real-world deontic reasoning without introducing selection or formalization artifacts.
What would settle it
A model that consistently exceeds 70 percent accuracy on the hard subsets while using the supplied Prolog workflow, or a legal-expert audit that finds the reference translations do not match actual statute interpretations.
Figures
read the original abstract
Reasoning with complex, context-specific rules remains challenging for large language models (LLMs). In legal and policy settings, this manifests as deontic reasoning: reasoning about obligations, permissions, and prohibitions under explicit rules. While many recent benchmarks emphasize short-context mathematical reasoning, fewer focus on long-context, high-stakes deontic reasoning. To address this gap, we introduce DEONTICBENCH, a benchmark of 6,232 tasks across U.S. federal taxes, airline baggage policies, U.S. immigration administration, and U.S. state housing law. These tasks can be approached in multiple ways, including direct reasoning in language or with the aid of symbolic computation. Besides free-form chain-of-thought reasoning, DEONTICBENCH enables an optional solver-based workflow in which models translate statutes and case facts into executable Prolog, leading to formal problem interpretations and an explicit program trace. We release reference Prolog programs for all instances. Across frontier LLMs and coding models, best hard-subset performance reaches only 44.4% on SARA Numeric and 46.6 macro-F1 on Housing. We further study training with supervised fine-tuning and reinforcement learning for symbolic program generation. Although training improves Prolog generation quality, current RL methods still fail to solve these tasks reliably. Overall, DEONTICBENCH provides a benchmark for studying context-grounded rule reasoning in real-world domains under both symbolic and non-symbolic settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DEONTICBENCH, a benchmark of 6,232 tasks across U.S. federal taxes, airline baggage, immigration, and housing law. Tasks support direct chain-of-thought reasoning in natural language or an optional symbolic workflow in which models translate statutes and facts into executable Prolog (with reference programs released for all instances). Evaluations across frontier LLMs and coding models show peak hard-subset performance of 44.4% on SARA Numeric and 46.6 macro-F1 on Housing; supervised fine-tuning and reinforcement learning improve Prolog generation quality but do not yield reliable task solutions.
Significance. If the tasks and reference Prolog programs faithfully encode the source statutes and facts, the benchmark would provide a valuable, reproducible resource for studying long-context deontic reasoning in high-stakes domains and for comparing neural versus symbolic approaches. The release of reference programs and the dual workflow are concrete strengths that support future work on improving rule-based reasoning.
major comments (2)
- [§3] §3 (Benchmark Construction): No validation is reported for the statute-to-Prolog translations (e.g., logical equivalence checks between natural-language and Prolog outcomes, expert review of deontic operators, or inter-annotator agreement on formalizations). This is load-bearing for the central claims, because the headline results (44.4% on SARA Numeric hard subset, 46.6 macro-F1 on Housing) and the conclusion that RL methods fail to solve the tasks reliably presuppose that the reference programs are faithful encodings; unexamined simplifications or encoding errors would make model failures artifacts of the benchmark rather than genuine deontic-reasoning limits.
- [§4] §4 (Task Subsets and Metrics): The definition and construction of the 'hard' subsets (used for the 44.4% figure) and the precise criteria for macro-F1 on Housing are not detailed with respect to selection bias controls or coverage of deontic edge cases. This weakens the interpretation that the reported numbers demonstrate broad model limitations rather than properties of the chosen instances.
minor comments (1)
- [Table 1] Table 1 and Figure 2: Axis labels and caption text could more explicitly distinguish direct CoT accuracy from solver-based accuracy to avoid reader confusion when comparing the two workflows.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of benchmark validation and subset construction. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3] §3 (Benchmark Construction): No validation is reported for the statute-to-Prolog translations (e.g., logical equivalence checks between natural-language and Prolog outcomes, expert review of deontic operators, or inter-annotator agreement on formalizations). This is load-bearing for the central claims, because the headline results (44.4% on SARA Numeric hard subset, 46.6 macro-F1 on Housing) and the conclusion that RL methods fail to solve the tasks reliably presuppose that the reference programs are faithful encodings; unexamined simplifications or encoding errors would make model failures artifacts of the benchmark rather than genuine deontic-reasoning limits.
Authors: We agree that the absence of reported validation steps for the Prolog translations is a limitation, as the reference programs are central to the benchmark's utility and claims. The manuscript currently relies on releasing the programs for external verification without detailing internal checks. In the revision, we will expand §3 with a dedicated subsection on the translation methodology, including how deontic operators were mapped to Prolog predicates, sample logical equivalence verifications (by running Prolog on held-out instances and comparing to natural-language ground truth), and any expert consultation performed during construction. We will also discuss potential encoding limitations explicitly. revision: yes
-
Referee: [§4] §4 (Task Subsets and Metrics): The definition and construction of the 'hard' subsets (used for the 44.4% figure) and the precise criteria for macro-F1 on Housing are not detailed with respect to selection bias controls or coverage of deontic edge cases. This weakens the interpretation that the reported numbers demonstrate broad model limitations rather than properties of the chosen instances.
Authors: We concur that the hard-subset criteria and Housing metric details require clarification to support the interpretation of model limitations. The hard subsets were constructed using factors such as rule count, exception presence, and context length, but these were not fully enumerated. In the revised manuscript, we will augment §4 with explicit selection criteria, quantitative statistics on deontic feature coverage (e.g., obligations, permissions, prohibitions, conflicts), and confirmation that selection avoided unintended bias beyond the stated rules. We will also specify the exact label set and macro-F1 computation for Housing. revision: yes
Circularity Check
No circularity: new benchmark with empirical evaluations
full rationale
The paper introduces DEONTICBENCH as a new collection of 6,232 tasks with released reference Prolog programs. All reported numbers (44.4% on SARA Numeric hard subset, 46.6 macro-F1 on Housing) are direct empirical measurements of model performance on these tasks. No equations, fitted parameters, or predictions appear that reduce to prior inputs by construction. No self-citations are invoked to justify uniqueness theorems or ansatzes. The derivation chain consists solely of task construction followed by external evaluation, which is self-contained and externally falsifiable.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Many-Tier Instruction Hierarchy in LLM Agents
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
Reference graph
Works this paper leans on
-
[1]
International ai safety report 2026.arXiv preprint arXiv:2602.21012, 2026
Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Malcolm Murray, Rishi Bommasani, Stephen Casper, Tom Davidson, Raymond Douglas, et al. International ai safety report 2026.arXiv preprint arXiv:2602.21012,
-
[2]
Openai cribbed our tax example, but can gpt-4 really do tax?arXiv preprint arXiv:2309.09992,
Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. Openai cribbed our tax example, but can gpt-4 really do tax?arXiv preprint arXiv:2309.09992,
-
[3]
Unilaw-r1: A large language model for legal reasoning with reinforcement learning and iterative inference
Hua Cai, Shuang Zhao, Liang Zhang, Xuli Shen, Qing Xu, Weilin Shen, Zihao Wen, and Tianke Ban. Unilaw-r1: A large language model for legal reasoning with reinforcement learning and iterative inference. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 18128–18142,
2025
-
[4]
Ruisheng Cao, Mouxiang Chen, Jiawei Chen, Zeyu Cui, Yunlong Feng, Binyuan Hui, Yuheng Jing, Kaixin Li, Mingze Li, Junyang Lin, Zeyao Ma, Kashun Shum, Xuwu Wang, Jinxi Wei, Jiaxi Yang, Jiajun Zhang, Lei Zhang, Zongmeng Zhang, Wenting Zhao, and Fan Zhou. Qwen3-coder-next technical report.arXiv preprint arXiv:2603.00729,
-
[5]
Predicate-guided generation for mathematical reasoning
Jiajun Chen and Yik-Cheung Tam. Predicate-guided generation for mathematical reasoning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 9097–9110,
2025
-
[6]
Justlogic: A comprehensive benchmark for evaluating deductive reasoning in large language models
Michael K Chen, Xikun Zhang, and Dacheng Tao. Justlogic: A comprehensive benchmark for evaluating deductive reasoning in large language models.arXiv preprint arXiv:2501.14851,
- [7]
-
[8]
arXiv preprint arXiv:2506.17088 (2025) 3
Jiahao Cheng, Tiancheng Su, Jia Yuan, Guoxiu He, Jiawei Liu, Xinqi Tao, Jingwen Xie, and Huaxia Li. Chain-of-thought prompting obscures hallucination cues in large language models: An empirical evaluation.arXiv preprint arXiv:2506.17088,
-
[9]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Large language models in legal systems: A survey.Humanities and Social Sciences Communications, 12(1):1977,
Fatemeh Dehghani, Roya Dehghani, Yazdan Naderzadeh Ardebili, and Shahryar Rahna- mayan. Large language models in legal systems: A survey.Humanities and Social Sciences Communications, 12(1):1977,
1977
-
[11]
Cl-bench: A benchmark for context learning
URL https://arxiv. org/abs/2602.03587. Scott Geng, Hamish Ivison, Chun-Liang Li Li, Maarten Sap, Jerry Li, Ranjay Krishna, and Pang Wei Koh. The delta learning hypothesis: Preference tuning on weak data can yield strong gains.arXiv preprint arXiv:2507.06187,
-
[12]
Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, et al. Frontiermath: A benchmark for evaluating advanced mathematical reasoning in ai. arXiv preprint arXiv:2411.04872,
-
[13]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Folio: Natural language reasoning with first-order logic
Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, et al. Folio: Natural language reasoning with first-order logic. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 22017–22031,
2024
-
[15]
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.arXiv preprint arXiv:2009.03300,
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[16]
Connecting symbolic statutory reason- ing with legal information extraction
Nils Holzenberger and Benjamin Van Durme. Connecting symbolic statutory reason- ing with legal information extraction. In Daniel Preo t,iuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras (eds.),Proceedings of the Natural Legal Language Processing Workshop 2023, pp. 113–131, Singapore, December
2023
-
[17]
doi: 10.18653/v1/2023.nllp-1.12
Association for Computational Linguistics. doi: 10.18653/v1/2023.nllp-1.12. URL https://aclanthology.org/2023.nllp-1.12/. Jiani Huang, Ziyang Li, Binghong Chen, Karan Samel, Mayur Naik, Le Song, and Xujie Si. Scallop: From probabilistic deductive databases to scalable differentiable reasoning. Advances in Neural Information Processing Systems, 34:25134–25145,
-
[18]
Contextbench: A benchmark for context retrieval in coding agents.arXiv preprint arXiv:2602.05892,
Han Li, Letian Zhu, Bohan Zhang, Rili Feng, Jiaming Wang, Yue Pan, Earl T Barr, Sarro Federica, Zhaoyang Chu, and He Ye. Contextbench: A benchmark for context retrieval in coding agents.arXiv preprint arXiv:2602.05892,
-
[19]
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding r1-zero-like training: A critical perspective.arXiv preprint arXiv:2503.20783,
-
[20]
Navapat Nananukul, Yue Zhang, Ryan Lee, Eric Boxer, Jonathan May, Vibhav Giridhar Gogate, Jay Pujara, and Mayank Kejriwal. Logicalthought: Logic-based ontological grounding of llms for high-assurance reasoning.arXiv preprint arXiv:2510.01530,
-
[21]
Senate bill s7263, 2025–2026 legislative session
New York State Senate. Senate bill s7263, 2025–2026 legislative session. https://www. nysenate.gov/legislation/bills/2025/S7263,
2025
-
[22]
arXiv preprint arXiv:2512.13961,
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Introducing GPT-5.2-Codex
OpenAI. Introducing GPT-5.2-Codex. https://openai.com/index/ introducing-gpt-5-2-codex/, December 2025a. OpenAI. Introducing openai o3 and o4-mini, 2025b. URL https://openai.com/index/ introducing-o3-and-o4-mini/. Liangming Pan, Alon Albalak, Xinyi Wang, and William Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logica...
2023
-
[24]
Neupsl: Neural probabilistic soft logic.arXiv preprint arXiv:2205.14268,
Connor Pryor, Charles Dickens, Eriq Augustine, Alon Albalak, William Wang, and Lise Getoor. Neupsl: Neural probabilistic soft logic.arXiv preprint arXiv:2205.14268,
-
[25]
Abulhair Saparov and He He. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought.arXiv preprint arXiv:2210.01240,
-
[26]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathe- matical reasoning in open language models.arXiv preprint arXiv:2402.03300,
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256,
work page internal anchor Pith review arXiv
-
[28]
Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity.arXiv preprint arXiv:2506.06941,
-
[29]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267,
work page internal anchor Pith review Pith/arXiv arXiv
-
[30]
Proofwriter: Generating implications, proofs, and abductive statements over natural language
Oyvind Tafjord, Bhavana Dalvi, and Peter Clark. Proofwriter: Generating implications, proofs, and abductive statements over natural language. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3621–3634,
2021
-
[31]
Kimi K2: Open Agentic Intelligence
Kimi Team, Yifan Bai, Yiping Bao, Y Charles, Cheng Chen, Guanduo Chen, Haiting Chen, Huarong Chen, Jiahao Chen, Ningxin Chen, et al. Kimi k2: Open agentic intelligence. arXiv preprint arXiv:2507.20534,
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
Xiaocheng Yang, Bingsen Chen, and Yik-Cheung Tam
URLhttps://arxiv.org/abs/2306.15626. Xiaocheng Yang, Bingsen Chen, and Yik-Cheung Tam. Arithmetic reasoning with LLM: prolog generation & permutation. In Kevin Duh, Helena G ´omez-Adorno, and Steven Bethard (eds.),Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies:...
-
[34]
A reasoning-focused legal retrieval benchmark
Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, Michal Skreta, Christopher D Man- ning, Peter Henderson, and Daniel E Ho. A reasoning-focused legal retrieval benchmark. InProceedings of the 2025 Symposium on Computer Science and Law, pp. 169–193,
2025
-
[35]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Association for Computational Linguistics. URLhttp://arxiv.org/abs/2403.13372. Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, and William Yang Wang. Rulearena: A benchmark for rule-guided reasoning with llms in real-world scenarios. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL),
work page internal anchor Pith review arXiv
-
[36]
You are a helpful assistant trained to conduct deontic reasoning
14 Preprint. Under review. A Additional Prompts Additional Instructions for Facts Extraction.The following instruction is used to prompt GPT-5-mini to extract factual information from the content of USCIS cases: USCIS facts extraction prompt Use this administrative appeals office appeal case to extract only the facts related to the petitioner (not the ana...
2012
-
[37]
B.5.2 Reinforcement Learning DR
We train these models on 4 H100 GPUs. B.5.2 Reinforcement Learning DR. GRPOFor reinforcement learning, we adopt Dr. GRPO (Liu et al., 2025), an unbiased variant of GRPO. Given an input question q, we sample a group of G responses {o1, . . ., oG} from the old policyπ θold (· |q). The clipped surrogate objective is JDr. GRPO(θ) =E q∼p Q,{oi}G i=1 ∼πθold (·|...
2025
-
[38]
In contrast, Qwen-based coding models frequently fail the tasks, resulting in near-zero scores in several settings
GPT-5.2-Codex achieves the strongest overall performance, yet its performance remains sensitive to prompts. In contrast, Qwen-based coding models frequently fail the tasks, resulting in near-zero scores in several settings. This suggests that code generation for rule-grounded reasoning is challenging for coding agents, with small prompt variations leading...
2022
-
[39]
D.1 USCIS-AAO Dataset Scope and Preprocessing The AAO possesses a defined jurisdiction, function, and policy
D USCIS-AAO Dataset Descriptive Statistics Table 8 presents the statistics of the USCIS-AAO dataset. D.1 USCIS-AAO Dataset Scope and Preprocessing The AAO possesses a defined jurisdiction, function, and policy. This appendix outlines the scope of publicly available materials and details the preprocessing steps used to construct the dataset. AAO Jurisdicti...
2020
-
[40]
Alice and her son continued doing so after Harold’s death
They had been living in the same house since 1993, maintained by Alice. Alice and her son continued doing so after Harold’s death. Alice’s gross income for the year 2017 was$236,422. Alice employed Bob, Cameron, Dan, Emily, Fred, and George for agricultural labor from Sep 9th to Oct 1st 2017, paying them$5,012 each. Alice takes the standard deduction in
1993
-
[41]
% Facts from the case spouse('Alice','Harold')
Question How much tax does Alice have to pay in 2017? Label $68,844 Reference Prolog (abridged). % Facts from the case spouse('Alice','Harold'). died('Harold',2016). not_remarried('Alice',2017). joint_return_possible('Alice','Harold',2016). child('Alice','Son'). birth_year('Son',2000). principal_abode('Son','Alice',2017). not_joint_return('Son',2017). mai...
2017
-
[42]
Bob had no income in
Alice’s income in 2015 was$504,598. Bob had no income in
2015
-
[43]
27 Preprint
Label Entailment Reference Prolog. 27 Preprint. Under review. /* ---------- Statutory rules ---------- */ /* Sec. 152(c)(1): qualifying child => dependent (subject to Sec. 152(b)(2) married-joint-return exception) */ dependent(Child, Taxpayer, Year) :- qualifying_child(Child, Taxpayer, Year), \+ married_joint_return(Child, Year). /* Sec. 152(b)(1): applie...
2015
-
[44]
-> format('Result: Entailment˜n') ; format('Result: Contradiction˜n') ). :- halt. E.3 Airline: Baggage Policy Reasoning Airline — Example Instance Statutes (excerpt) The airline domain encodes American Airlines’ published baggage policies covering carry-on allowances, checked-bag fees by route and cabin class, and special-item surcharges. Policy fragment ...
1981
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.