Recognition: unknown
Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization
Pith reviewed 2026-05-08 10:02 UTC · model grok-4.3
The pith
Joint Consistency aggregates LLM reasoning traces by minimizing an energy function that accounts for interactions between candidates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Joint Consistency formulates test-time aggregation as a constrained Ising-type energy minimization problem, where independent evaluation signals serve as external fields and pairwise comparisons from an LLM judge serve as interaction terms. This framework subsumes existing voting and weighted aggregation methods as special cases under particular choices of the interaction matrix. An efficient approximation strategy is developed to make the modeling practical for large numbers of traces.
What carries the argument
A constrained Ising-type energy minimization problem in which external fields come from independent trace evaluations and interactions come from pairwise LLM judge comparisons.
Load-bearing premise
The LLM judge's pairwise comparisons provide meaningful information about the relative consistency or correctness of the candidate answers.
What would settle it
Replacing the interaction matrix with random values or zeros on the same benchmarks and finding that Joint Consistency no longer outperforms majority voting would falsify the contribution of the interaction terms.
Figures
read the original abstract
This paper studies test-time aggregation, an approach that generates multiple reasoning traces and aggregates them into a final answer. Most existing methods rely on evaluation signals collected from candidate traces in isolation or answer frequencies, while ignoring comparative interactions among candidates. We propose Joint Consistency (JC), formulated as a constrained Ising-type energy minimization problem, where independent evaluation signals act as external fields and pairwise comparisons act as interactions. JC provides a unified framework for test-time aggregation that subsumes existing voting and weighted aggregation methods as special cases. Our construction of the interaction matrix leverages LLM-as-a-judge comparisons, and admits a theoretical interpretation under answer-level homogeneity assumptions. Moreover, we develop an efficient approximation strategy that makes interaction modeling practical for large-scale test-time aggregation. Experiments on math and code reasoning benchmarks show that JC consistently outperforms existing baselines across tasks, judge models, trace budgets, and trace-generation settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Joint Consistency (JC), a test-time aggregation method for LLM reasoning traces formulated as a constrained Ising-type energy minimization problem. Independent evaluation signals serve as external fields, while pairwise comparisons from LLM-as-a-judge provide the interaction terms in the energy function. The framework is claimed to subsume voting and weighted aggregation as special cases, supported by an efficient approximation algorithm. Experiments on math and code reasoning benchmarks indicate consistent outperformance over baselines under various conditions including different judge models and trace budgets.
Significance. Should the central claims hold, this work provides a novel energy-based unification of test-time aggregation techniques, potentially enabling more effective use of multiple reasoning traces by explicitly modeling their interactions. This could have implications for improving the reliability of LLM outputs in complex reasoning tasks, moving beyond frequency-based or isolated scoring methods.
major comments (1)
- [Abstract and method section] Abstract and method section: The unification claim that JC subsumes voting and weighted aggregation methods as special cases is load-bearing for the paper's positioning as a 'unified framework'. While the abstract notes a theoretical interpretation under answer-level homogeneity assumptions, the manuscript should provide an explicit derivation showing the parameter settings (e.g., interaction matrix J=0) that recover these baselines, and analyze sensitivity to violations of the homogeneity assumption given that reasoning traces for the same answer often vary in structure and quality.
minor comments (2)
- [Approximation strategy] The description of the efficient approximation strategy would benefit from pseudocode or a complexity analysis to make the practical implementation clearer.
- [Experiments] Experimental tables should report variance or confidence intervals alongside mean performance to support the 'consistent outperformance' claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The unification claim is indeed central to positioning JC as a framework, and we address the request for greater explicitness below. We will revise the manuscript to strengthen this aspect while preserving the original technical contributions.
read point-by-point responses
-
Referee: [Abstract and method section] Abstract and method section: The unification claim that JC subsumes voting and weighted aggregation methods as special cases is load-bearing for the paper's positioning as a 'unified framework'. While the abstract notes a theoretical interpretation under answer-level homogeneity assumptions, the manuscript should provide an explicit derivation showing the parameter settings (e.g., interaction matrix J=0) that recover these baselines, and analyze sensitivity to violations of the homogeneity assumption given that reasoning traces for the same answer often vary in structure and quality.
Authors: We agree that an explicit derivation will improve clarity. In the revised manuscript we will add a dedicated paragraph in Section 3 deriving the special cases: setting the interaction matrix J identically to zero recovers unweighted majority voting (the external fields then reduce to per-answer scores), while appropriate scaling of the fields recovers weighted aggregation. This derivation holds exactly under the stated answer-level homogeneity assumption. For sensitivity to violations of homogeneity, our existing experiments already provide supporting evidence: performance gains remain consistent across trace-generation settings that produce structurally diverse reasoning traces for the same answer (see Tables 2–4 and the ablation on trace budgets). We will add a short discussion paragraph acknowledging that strong violations could in principle degrade the interaction terms and outlining that the efficient approximation algorithm remains well-defined regardless. A full formal sensitivity analysis lies beyond the current scope but can be pursued in follow-up work; the empirical robustness already demonstrated mitigates the practical concern. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper formulates Joint Consistency as a constrained Ising-type energy minimization with external fields from independent signals and interactions from LLM-as-a-judge pairwise comparisons. It states that this subsumes voting and weighted aggregation as special cases via appropriate parameter settings in the interaction matrix, and provides a theoretical interpretation only under explicit answer-level homogeneity assumptions. No equations or steps in the abstract reduce a claimed result or performance metric back to a fitted input or self-definition by construction. The unification is presented as a modeling choice rather than a forced equivalence, and empirical results on benchmarks are independent of any self-citation chain. The derivation remains self-contained against external benchmarks with no load-bearing self-citations or ansatz smuggling identified.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption answer-level homogeneity assumptions
Reference graph
Works this paper leans on
-
[1]
2025 AIME I
Art of Problem Solving. 2025 AIME I. Art of Problem Solving Wiki, 2025. Accessed: 2025
2025
-
[2]
2025 AIME II
Art of Problem Solving. 2025 AIME II. Art of Problem Solving Wiki, 2025. Accessed: 2025
2025
-
[3]
Vidhisha Balachandran, Jingya Chen, Lingjiao Chen, Shivam Garg, Neel Joshi, Yash Lara, John Langford, Besmira Nushi, Vibhav Vineet, Yue Wu, et al. Inference-time scaling for complex tasks: Where we stand and what lies ahead.arXiv preprint arXiv:2504.00294, 2025
-
[4]
Math- arena: Evaluating llms on uncontaminated math competitions.Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmark, 2025
Mislav Balunovi´c, Jasper Dekoninck, Ivo Petrov, Nikola Jovanovi´c, and Martin Vechev. Math- arena: Evaluating llms on uncontaminated math competitions.Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmark, 2025
2025
-
[5]
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V Le, Christopher Ré, and Azalia Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling.arXiv preprint arXiv:2407.21787, 2024
work page internal anchor Pith review arXiv 2024
-
[6]
Jiefeng Chen, Jie Ren, Xinyun Chen, Chengrun Yang, Ruoxi Sun, Jinsung Yoon, and Sercan Ö Arık. Sets: Leveraging self-verification and self-correction for improved test-time scaling.arXiv preprint arXiv:2501.19306, 2025
-
[7]
Universal self-consistency for large language models
Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, and Denny Zhou. Universal self-consistency for large language models. InICML Workshop on In-Context Learning, 2024
2024
-
[8]
Deep reinforcement learning from human preferences.Advances in neural information processing systems, 2017
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 2017
2017
-
[9]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review arXiv 2021
-
[10]
Deepseek-v4: Towards highly efficient million-token context intelligence, 2026
DeepSeek-AI. Deepseek-v4: Towards highly efficient million-token context intelligence, 2026
2026
-
[11]
Llm self- correction with decrim: Decompose, critique, and refine for enhanced following of instructions with multiple constraints
Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal, and Nanyun Peng. Llm self- correction with decrim: Decompose, critique, and refine for enhanced following of instructions with multiple constraints. InFindings of the Association for Computational Linguistics: E...
2024
-
[12]
Deep think with confidence
Yichao Fu, Xuewei Wang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence. In NeurIPS 2025 Workshop on Efficient Reasoning, 2025
2025
-
[13]
Critic: Large language models can self-correct with tool-interactive critiquing
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yujiu Yang, Nan Duan, Weizhu Chen, et al. Critic: Large language models can self-correct with tool-interactive critiquing. InInternational Confer- ence on Learning Representations, 2024
2024
-
[14]
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, and Sida I Wang. Cruxeval: A benchmark for code reasoning, understanding and execution.arXiv preprint arXiv:2401.03065, 2024
work page internal anchor Pith review arXiv 2024
-
[15]
A survey on llm-as-a-judge.The Innovation, 2024
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on llm-as-a-judge.The Innovation, 2024
2024
-
[16]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 10
work page internal anchor Pith review arXiv 2025
-
[17]
Reward reasoning models
Jiaxin Guo, Zewen Chi, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, and Furu Wei. Reward reasoning models. InNeural Information Processing Systems, 2025
2025
-
[18]
Reason- ing with language model is planning with world model
Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. Reason- ing with language model is planning with world model. InConference on Empirical Methods in Natural Language Processing, 2023
2023
-
[19]
Is best-of-n the best of them? coverage, scaling, and optimality in inference-time alignment
Audrey Huang, Adam Block, Qinghua Liu, Nan Jiang, Akshay Krishnamurthy, and Dylan J Foster. Is best-of-n the best of them? coverage, scaling, and optimality in inference-time alignment. InForty-second International Conference on Machine Learning, 2025
2025
-
[20]
Regularized best-of-n sampling with minimum bayes risk objective for language model alignment
Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, and Kenshi Abe. Regularized best-of-n sampling with minimum bayes risk objective for language model alignment. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025
2025
-
[21]
Language Models (Mostly) Know What They Know
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221, 2022
work page internal anchor Pith review arXiv 2022
-
[22]
arXiv preprint arXiv:2502.18581 , year=
Zhewei Kang, Xuandong Zhao, and Dawn Song. Scalable best-of-n selection for large language models via self-certainty.arXiv preprint arXiv:2502.18581, 2025
-
[23]
Semantic self-consistency: Enhancing language model reasoning via semantic weighting
Tim Knappe, Ryan Luo Li, Ayush Chauhan, Kaylee Chhua, Kevin Zhu, and Sean O’Brien. Semantic self-consistency: Enhancing language model reasoning via semantic weighting. In The 4th Workshop on Mathematical Reasoning and AI at NeurIPS, 2024
2024
-
[24]
Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schu- urmans, and Xinyun Chen. Evolving deeper llm thinking.arXiv preprint arXiv:2501.09891, 2025
-
[25]
Yantao Liu, Zijun Yao, Rui Min, Yixin Cao, Lei Hou, and Juanzi Li. Pairjudge rm: Perform best-of-n sampling with knockout tournament.arXiv preprint arXiv:2501.13007, 2025
-
[26]
Large language model guided tree-of-thought
Jieyi Long. Large language model guided tree-of-thought.arXiv preprint arXiv:2305.08291, 2023
-
[27]
Ising formulations of many np problems.Frontiers in physics, 2:5, 2014
Andrew Lucas. Ising formulations of many np problems.Frontiers in physics, 2:5, 2014
2014
-
[28]
Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 2023
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 2023
2023
-
[29]
gpt-oss-120b & gpt-oss-20b model card, 2025
OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025
2025
-
[30]
Introducing gpt-5.2
OpenAI. Introducing gpt-5.2. https://openai.com/index/introducing-gpt-5-2/ , De- cember 2025. Accessed: 2026-05-05
2025
-
[31]
Training language models to follow instructions with human feedback.Advances in neural information processing systems, 2022
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 2022
2022
-
[32]
Recursive introspection: Teaching language model agents how to self-improve.Advances in Neural Information Processing Systems, 2024
Yuxiao Qu, Tianjun Zhang, Naman Garg, and Aviral Kumar. Recursive introspection: Teaching language model agents how to self-improve.Advances in Neural Information Processing Systems, 2024
2024
-
[33]
Qwen3.5: Towards native multimodal agents, February 2026
Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026
2026
-
[34]
How do large language monkeys get their power (laws)? InForty-second International Conference on Machine Learning, 2025
Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, and Sanmi Koyejo. How do large language monkeys get their power (laws)? InForty-second International Conference on Machine Learning, 2025. 11
2025
-
[35]
Reasoning under uncertainty: Efficient llm inference via unsupervised confidence dilution and convergent adaptive sampling
Zhenning Shi, Yijia Zhu, Yi Xie, Junhan Shi, Guorui Xie, Haotian Zhang, Yong Jiang, Congcong Miao, and Qing Li. Reasoning under uncertainty: Efficient llm inference via unsupervised confidence dilution and convergent adaptive sampling. InConference on Empirical Methods in Natural Language Processing, 2025
2025
-
[36]
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute opti- mally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024
work page internal anchor Pith review arXiv 2024
-
[37]
arXiv preprint arXiv:2502.06233 , year=
Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, and Gal Yona. Confidence improves self-consistency in llms.arXiv preprint arXiv:2502.06233, 2025
-
[38]
Logical reasoning with outcome reward models for test-time scaling
Ramya Keerthy Thatikonda, Wray Buntine, and Ehsan Shareghi. Logical reasoning with outcome reward models for test-time scaling. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26113–26123, 2025
2025
-
[39]
Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback
Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, and Christopher D Manning. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. InConference on Empirical Methods in Natural Language Processing, 2023
2023
-
[40]
Reasoning aware self-consistency: Leveraging reasoning paths for efficient llm sampling
Guangya Wan, Yuqi Wu, Jie Chen, and Sheng Li. Reasoning aware self-consistency: Leveraging reasoning paths for efficient llm sampling. InConference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3613–3635, 2025
2025
-
[41]
Alphazero-like tree-search can guide large language model decoding and training
Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, and Jun Wang. Alphazero-like tree-search can guide large language model decoding and training. InInternational Conference on Machine Learning, 2024
2024
-
[42]
Self-consistency improves chain of thought reasoning in language models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InThe Eleventh International Conference on Learning Representations, 2023
2023
-
[43]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022
2022
-
[44]
From decoding to meta-generation: Inference-time algorithms for large language models.Transactions on Machine Learning Research, 2024
Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, and Zaid Harchaoui. From decoding to meta-generation: Inference-time algorithms for large language models.Transactions on Machine Learning Research, 2024
2024
-
[45]
Large language models are better reasoners with self-verification
Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. InFindings of the Association for Computational Linguistics: EMNLP, 2023
2023
-
[46]
Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms
Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, and Bryan Hooi. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. In International Conference on Learning Representations, 2024
2024
-
[47]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review arXiv 2025
-
[48]
Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 2023
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 2023
2023
-
[49]
Confidence-aware reasoning: Optimizing self-guided thinking trajectories in large reasoning models
Jiaxin Zhang. Confidence-aware reasoning: Optimizing self-guided thinking trajectories in large reasoning models. InConference on Empirical Methods in Natural Language Processing: Industry Track, 2025. 12
2025
-
[50]
Sample, scrutinize and scale: Effective inference-time search by scaling verification
Eric Zhao, Pranjal Awasthi, and Sreenivas Gollapudi. Sample, scrutinize and scale: Effective inference-time search by scaling verification. InForty-second International Conference on Machine Learning, 2025
2025
-
[51]
Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 2023
2023
-
[52]
Evaluating judges as evaluators: The jetts benchmark of llm-as-judges as test-time scaling evaluators
Yilun Zhou, Austin Xu, PeiFeng Wang, Caiming Xiong, and Shafiq Joty. Evaluating judges as evaluators: The jetts benchmark of llm-as-judges as test-time scaling evaluators. InInternational Conference on Machine Learning, 2025
2025
-
[53]
Accu- racy(%)/$Cost
Zhi Zhou, Yuhao Tan, Zenan Li, Yuan Yao, Lan-Zhe Guo, Yu-Feng Li, and Xiaoxing Ma. A theoretical study on bridging internal probability and self-consistency for llm reasoning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. A Related Work Test-time scaling (TTS) improves accuracy on complex reasoning tasks for LLMs by ...
2025
-
[55]
Reasoning: \n\n
Is the reasoning process correct? Please choose an evaluation score among 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0. Please only output only the evaluation score. E.1.2 Ising-J Prompt user Suppose there are two responses to the same question. Please output the probability that Response 1 is a better answer than Response 2. #### Question #### {qu...
-
[56]
Is the answer correct?
-
[57]
Reasoning: \n\n
Is the reasoning process correct? Please choose an evaluation score among 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0. Please only output only the evaluation score. E.2.2 Ising-J Prompt user Suppose there are two responses to the same Python function and input. Please output the probability that Response 1 is a better answer than Response 2. #### ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.