Recognition: no theorem link
Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization
Pith reviewed 2026-05-13 01:13 UTC · model grok-4.3
The pith
A preference alignment method called Macro improves the validity of multilingual self-generated counterfactual explanations by 12.55 percent on average while maintaining minimality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Macro applies Direct Preference Optimization to multilingual SCE generation using a composite scoring function to construct preference pairs that translate the validity-minimality trade-off into training signals. Across four LLMs and seven typologically diverse languages, it improves validity by 12.55% on average over chain-of-thought without degrading minimality, avoids the minimality issues of translation baselines, and surpasses supervised fine-tuning on both metrics, with added benefits in cross-lingual alignment and error reduction.
What carries the argument
Macro, a DPO framework that builds preference pairs via a composite scoring function evaluating both validity and minimality for multilingual counterfactual generation.
If this is right
- Validity of generated explanations increases by 12.55% on average compared to chain-of-thought prompting.
- Minimality is preserved, unlike in translation-based methods that violate it severely.
- Performance on both validity and minimality exceeds that of supervised fine-tuning.
- Cross-lingual perturbation alignment improves and common generation errors decrease.
Where Pith is reading between the lines
- Similar preference optimization could help resolve trade-offs in other LLM explanation or generation tasks.
- Testing Macro on additional low-resource languages might reveal whether the method scales without language-specific adjustments.
- The reliance on a composite score suggests that refining the scoring function could further enhance results in specific domains.
Load-bearing premise
The composite scoring function used to build preference pairs measures the validity and minimality trade-off accurately and without introducing bias across languages and models.
What would settle it
Running the same experiments with Macro on the four LLMs and seven languages and finding no significant average improvement in validity or a degradation in minimality compared to the chain-of-thought baseline would falsify the main claim.
Figures
read the original abstract
Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond English remains challenging: existing methods struggle to produce valid SCEs in non-dominant languages, and a persistent trade-off between validity and minimality undermines explanation quality. We introduce Macro, a preference alignment framework that applies Direct Preference Optimization (DPO) to multilingual SCE generation, using a composite scoring function to construct preference pairs that effectively translate the trade-off into measurable preference signals. Experiments across four LLMs and seven typologically diverse languages show that Macro improves validity by 12.55\% on average over the chain-of-thought baseline without degrading minimality, while avoiding the severe minimality violations of the translation-based baseline. Compared to supervised fine-tuning, Macro achieves superior performance on both metrics, confirming that explicit preference optimization is essential for balancing this trade-off. Further analyses reveal that Macro increases cross-lingual perturbation alignment and mitigates common generation errors. Our results highlight preference optimization as a promising direction for enhancing multilingual model explanations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Macro, a preference alignment framework that applies Direct Preference Optimization (DPO) to multilingual self-generated counterfactual explanation (SCE) generation. Preference pairs are constructed using a composite scoring function that encodes the validity-minimality trade-off; experiments across four LLMs and seven typologically diverse languages report that Macro improves validity by 12.55% on average over a chain-of-thought baseline without degrading minimality, outperforms supervised fine-tuning, and avoids the minimality violations seen in translation-based baselines.
Significance. If the composite scoring function is shown to produce unbiased, language-agnostic preference signals that align with human judgments of explanation quality, the result would be significant for multilingual explainable AI. It would demonstrate that explicit preference optimization can resolve the validity-minimality trade-off more effectively than standard fine-tuning or translation pipelines, with potential implications for cross-lingual model interpretability.
major comments (2)
- [§3.2] §3.2 (Preference Pair Construction): The composite scoring function used to order pairs for DPO is described only at a high level; the explicit formula, the weighting scheme between validity and minimality components, and any cross-lingual or cross-model validation of those weights are not provided. Because the entire DPO training signal depends on the ordering induced by this function, the absence of these details makes it impossible to verify that the reported 12.55% validity gain reflects a genuine improvement rather than an artifact of the scoring rule.
- [§4] §4 (Experiments): The headline result of a 12.55% average validity improvement is presented without language-specific breakdowns, per-model tables, error bars, or statistical significance tests. In addition, the precise operational definitions of the validity and minimality metrics (and how they are computed for non-English inputs) are not stated. These omissions are load-bearing because the central claim is an empirical average over seven typologically diverse languages; without the supporting data it cannot be assessed whether the improvement is uniform or driven by a subset of languages or models.
minor comments (2)
- [Abstract] The abstract and introduction use the acronym 'Macro' without expanding it or briefly glossing its construction.
- [§3] Notation for the validity and minimality scores is introduced without a consolidated table of symbols, making it harder to track how the composite function is assembled.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments identify key areas where additional detail will improve clarity and verifiability. We address each major comment below and will revise the manuscript to incorporate the requested information.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Preference Pair Construction): The composite scoring function used to order pairs for DPO is described only at a high level; the explicit formula, the weighting scheme between validity and minimality components, and any cross-lingual or cross-model validation of those weights are not provided. Because the entire DPO training signal depends on the ordering induced by this function, the absence of these details makes it impossible to verify that the reported 12.55% validity gain reflects a genuine improvement rather than an artifact of the scoring rule.
Authors: We agree that the current high-level description in §3.2 leaves important details unspecified. In the revised manuscript we will expand this section to provide the explicit formula for the composite scoring function, the precise weighting scheme applied to the validity and minimality components, and the results of cross-lingual and cross-model validation performed to confirm that the induced preference ordering is stable. These additions will allow readers to reproduce the preference-pair construction and assess whether the reported gains arise from the scoring rule itself. revision: yes
-
Referee: [§4] §4 (Experiments): The headline result of a 12.55% average validity improvement is presented without language-specific breakdowns, per-model tables, error bars, or statistical significance tests. In addition, the precise operational definitions of the validity and minimality metrics (and how they are computed for non-English inputs) are not stated. These omissions are load-bearing because the central claim is an empirical average over seven typologically diverse languages; without the supporting data it cannot be assessed whether the improvement is uniform or driven by a subset of languages or models.
Authors: We acknowledge that §4 would benefit from more granular reporting. In the revision we will add language-specific and per-model tables for both validity and minimality, include error bars, and report statistical significance via paired t-tests. We will also state the operational definitions explicitly: validity is the fraction of generated SCEs that flip the model’s original prediction, and minimality is the normalized token-level edit distance. Both metrics are computed using language-appropriate tokenizers and the same underlying classifier for all languages, ensuring consistent evaluation across the seven typologically diverse languages. These changes will demonstrate that the 12.55 % average improvement is not driven by a subset of languages or models. revision: yes
Circularity Check
No circularity: empirical results from DPO on externally scored pairs
full rationale
The paper's chain consists of (1) defining a composite scorer to rank candidate counterfactuals, (2) building preference pairs from those rankings, (3) running DPO, and (4) measuring validity/minimality gains on held-out test sets across four LLMs and seven languages. None of these steps reduces to its own inputs by construction: the scorer is an input assumption whose correctness is tested by the downstream human-aligned metrics, the DPO objective is standard, and the reported 12.55 % average improvement is an empirical average against independent baselines. No equations, self-definitional loops, fitted-parameter-as-prediction, or load-bearing self-citations appear in the abstract or method description.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Composite scoring function accurately reflects the validity-minimality trade-off for preference pair construction
Reference graph
Works this paper leans on
-
[1]
and Mao, Yanke and Gao, Haonan and Lee, En-Shiun Annie
Adelani, David Ifeoluwa and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba O. and Mao, Yanke and Gao, Haonan and Lee, En-Shiun Annie. SIB -200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects. Proceedings of the 18th Conference of the European Chapter of the Association for Co...
-
[2]
From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations , author=. 2026 , eprint=
work page 2026
-
[3]
Barriere, Valentin and Cifuentes, Sebastian. Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method with Least Confounding Variables. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
work page 2024
-
[4]
Barriere, Valentin and Cifuentes, Sebastian. A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.34
-
[5]
Bhattacharjee, Amrita and Moraffah, Raha and Garland, Joshua and Liu, Huan , booktitle =. 2024 , volume =. doi:10.1109/BigData62323.2024.10825537 , url =
-
[6]
TIGTEC: Token Importance Guided TExt Counterfactuals
Bhan, Milan and Vittaut, Jean-No \"e l and Chesneau, Nicolas and Lesot, Marie-Jeanne. TIGTEC: Token Importance Guided TExt Counterfactuals. Machine Learning and Knowledge Discovery in Databases: Research Track. 2023
work page 2023
-
[7]
the method of paired comparisons , author=
Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=
work page 1952
-
[8]
Blaschke, Verena and Fedzechkina, Masha and Ter Hoeve, Maartje. Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.454
-
[9]
DRIV-EX: Counterfactual Explanations for Driving LLMs , author=. 2026 , eprint=
work page 2026
-
[10]
Unsupervised Cross-lingual Representation Learning at Scale
Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...
-
[11]
XNLI : Evaluating Cross-lingual Sentence Representations
Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel and Schwenk, Holger and Stoyanov, Veselin. XNLI : Evaluating Cross-lingual Sentence Representations. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1269
-
[12]
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLM s
Dang, John and Ahmadian, Arash and Marchisio, Kelly and Kreutzer, Julia and. RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLM s. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.729
-
[13]
Dang, Thao Anh and Raviv, Limor and Galke, Lukas. Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of m T 5 and B y T 5. Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025). 2025
work page 2025
-
[14]
Can LLM s Explain Themselves Counterfactually?
Dehghanighobadi, Zahra and Fischer, Asja and Zafar, Muhammad Bilal. Can LLM s Explain Themselves Counterfactually?. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.396
-
[15]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Ethayarajh, Kawin and Xu, Winnie and Muennighoff, Niklas and Jurafsky, Dan and Kiela, Douwe , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
work page 2024
-
[16]
Do Multilingual Language Models Think Better in E nglish?
Etxaniz, Julen and Azkune, Gorka and Soroa, Aitor and Lopez de Lacalle, Oier and Artetxe, Mikel. Do Multilingual Language Models Think Better in E nglish?. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.naacl...
-
[17]
Goodhart, C. A. E. Problems of Monetary Management: The UK Experience. Monetary Theory and Practice: The UK Experience. 1984. doi:10.1007/978-1-349-17295-5_4
-
[18]
The world atlas of language structures , author=. 2005 , publisher=
work page 2005
-
[19]
International Conference on Learning Representations , year=
Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=
-
[20]
Muhamet Kastrati and Ali Shariq Imran and Ehtesham Hashmi and Zenun Kastrati and Sher Muhammad Daudpota and Marenglen Biba , keywords =. Unlocking language barriers: Assessing pre-trained large language models across multilingual tasks and unveiling the black box with Explainable Artificial Intelligence , journal =. 2025 , issn =. doi:https://doi.org/10.1...
-
[21]
Koishekenov, Yeskendir and Berard, Alexandre and Nikoulina, Vassilina. Memory-efficient NLLB -200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.198
-
[22]
The annals of mathematical statistics , volume=
On information and sufficiency , author=. The annals of mathematical statistics , volume=. 1951 , publisher=
work page 1951
-
[23]
Lai, Viet Dac and Ngo, Nghia and Pouran Ben Veyseh, Amir and Man, Hieu and Dernoncourt, Franck and Bui, Trung and Nguyen, Thien Huu. C hat GPT Beyond E nglish: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.878
-
[24]
Lai, Viet and Nguyen, Chien and Ngo, Nghia and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan and Nguyen, Thien. Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023. do...
-
[25]
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment , author=. 2025 , eprint=
work page 2025
-
[26]
Counterfactual Data Augmentation for Neural Machine Translation
Liu, Qi and Kusner, Matt and Blunsom, Phil. Counterfactual Data Augmentation for Neural Machine Translation. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.18
-
[27]
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization , author=. 2026 , eprint=
work page 2026
-
[28]
Mayne, Harry and Kearns, Ryan Othniel and Yang, Yushi and Bean, Andrew M. and Delaney, Eoin D. and Russell, Chris and Mahdi, Adam. LLM s Don ' t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2...
-
[29]
Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages
Ma, Chunlan and Imani, Ayyoob and Ye, Haotian and Pei, Renhao and Asgari, Ehsaneddin and Schuetze, Hinrich. Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Pape...
-
[30]
LLM s for Generating and Evaluating Counterfactuals: A Comprehensive Study
Nguyen, Van Bach and Youssef, Paul and Seifert, Christin and Schl. LLM s for Generating and Evaluating Counterfactuals: A Comprehensive Study. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.870
-
[31]
The Roles of English in Evaluating Multilingual Language Models
Poelman, Wessel and de Lhoneux, Miryam. The Roles of English in Evaluating Multilingual Language Models. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025). 2025
work page 2025
-
[32]
When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy
Qi, Jirui and Chen, Shan and Xiong, Zidi and Fern \'a ndez, Raquel and Bitterman, Danielle and Bisazza, Arianna. When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.1103
-
[33]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model , url =
Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Manning, Christopher D and Ermon, Stefano and Finn, Chelsea , booktitle =. Direct Preference Optimization: Your Language Model is Secretly a Reward Model , url =
-
[34]
Explainability and Interpretability of Multilingual Large Language Models: A Survey
Resck, Lucas and Augenstein, Isabelle and Korhonen, Anna. Explainability and Interpretability of Multilingual Large Language Models: A Survey. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1033
-
[35]
Explaining NLP Models via Minimal Contrastive Editing ( M i CE )
Ross, Alexis and Marasovi \'c , Ana and Peters, Matthew. Explaining NLP Models via Minimal Contrastive Editing ( M i CE ). Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.336
-
[36]
Saha Roy, Rishiraj and Schlotthauer, Joel and Hinze, Chris and Foltyn, Andreas and Hahn, Luzian and Kuech, Fabian , title =. 2025 , isbn =. doi:10.1145/3701551.3704126 , booktitle =
-
[37]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=
work page 2024
-
[38]
MAPO : Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization
She, Shuaijie and Zou, Wei and Huang, Shujian and Zhu, Wenhao and Liu, Xiang and Geng, Xiang and Chen, Jiajun. MAPO : Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.539
-
[39]
m GPT : Few-Shot Learners Go Multilingual
Shliazhko, Oleh and Fenogenova, Alena and Tikhonova, Maria and Kozlova, Anastasia and Mikhailov, Vladislav and Shavrina, Tatiana. m GPT : Few-Shot Learners Go Multilingual. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00633
-
[40]
Steigerwald, Emma and Ramírez-Castañeda, Valeria and Brandt, Débora Y C and Báldi, András and Shapiro, Julie Teresa and Bowker, Lynne and Tarvin, Rebecca D , title =. BioScience , volume =. 2022 , month =. doi:10.1093/biosci/biac062 , url =
-
[41]
Self-rationalization improves LLM as a fine-grained judge , author=. 2024 , eprint=
work page 2024
-
[42]
Anchored Alignment for Self-Explanations Enhancement , author=. 2024 , eprint=
work page 2024
-
[43]
Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation , author =. 2026 , eprint =
work page 2026
-
[44]
Wang, Qianli and Anikina, Tatiana and Feldhus, Nils and Ostermann, Simon and Splitt, Fedor and Li, Jiaao and Tsoneva, Yoana and M. Multilingual Datasets for Custom Input Extraction and Explanation Requests Parsing in Conversational XAI Systems. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.29
-
[45]
iFlip: Iterative Feedback-driven Counterfactual Example Refinement , author =. 2026 , eprint =
work page 2026
-
[46]
Wang, Qianli and Feldhus, Nils and Ostermann, Simon and Villa-Arenas, Luis Felipe and Möller, Sebastian and Schmitt, Vera , editor =. ACL Findings , month =. 2025 , address =. doi:10.18653/v1/2025.findings-acl.64 , pages =
-
[47]
A Survey on Natural Language Counterfactual Generation
Wang, Yongjie and Qiu, Xiaoqi and Yue, Yu and Guo, Xu and Zeng, Zhiwei and Feng, Yuhong and Shen, Zhiqi. A Survey on Natural Language Counterfactual Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.276
-
[48]
Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =
work page 2022
-
[49]
MMLU - P ro X : A Multilingual Benchmark for Advanced Large Language Model Evaluation
Xuan, Weihao and Yang, Rui and Qi, Heli and Zeng, Qingcheng and Xiao, Yunze and Feng, Aosong and Liu, Dairui and Xing, Yun and Wang, Junjue and Gao, Fan and Lu, Jinghui and Jiang, Yuang and Li, Huitao and Li, Xin and Yu, Kunyu and Dong, Ruihai and Gu, Shangding and Li, Yuekang and Xie, Xiaofei and Juefei-Xu, Felix and Khomh, Foutse and Yoshie, Osamu and C...
-
[50]
Zhao, Haiyan and Chen, Hanjie and Yang, Fan and Liu, Ninghao and Deng, Huiqi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Du, Mengnan , title =. 2024 , issue_date =. doi:10.1145/3639372 , journal =
-
[51]
EMSA: Explainable multilingual sentiment analysis models providing sentiment analysis across multiple languages , year =. PLOS ONE , publisher =. doi:10.1371/journal.pone.0333508 , author =
-
[52]
Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...
-
[53]
Tulu 3: Pushing Frontiers in Open Language Model Post-Training , author=. 2025 , eprint=
work page 2025
- [54]
- [55]
- [56]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.