Recognition: 1 theorem link
· Lean TheoremMeasuring the metacognition of AI
Pith reviewed 2026-05-13 23:39 UTC · model grok-4.3
The pith
The meta-d' framework should serve as the standard measure of metacognitive sensitivity in AI systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The meta-d' framework quantifies metacognitive sensitivity in AIs by measuring how effectively confidence ratings distinguish correct from incorrect primary judgments, enabling comparisons to optimality, across models, and across tasks, while signal detection theory separately assesses whether AIs spontaneously become more conservative in high-risk decision settings.
What carries the argument
The meta-d' framework, which applies signal detection theory to separate metacognitive sensitivity from response bias in confidence ratings after a primary judgment.
If this is right
- LLMs can be ranked by how close their metacognition comes to optimality on any given task.
- The same LLM can be compared to itself across tasks or versions to track metacognitive changes.
- Signal detection theory reveals whether models regulate decisions more conservatively under higher risk.
- Developers gain a standardized metric to evaluate uncertainty handling separate from raw accuracy.
Where Pith is reading between the lines
- The same measurement approach could be applied to non-language AI systems such as image classifiers to test consistency across modalities.
- Training objectives might be designed to directly maximize meta-d' rather than only accuracy or calibration loss.
- If meta-d' proves stable, it could inform safety standards requiring minimum metacognitive thresholds for high-stakes AI deployment.
Load-bearing premise
Psychophysical tools developed for human perceptual judgments transfer directly to large language models without AI-specific adjustments to their assumptions.
What would settle it
An experiment showing that meta-d' scores for a given LLM fail to predict its actual accuracy differences between high- and low-confidence responses on held-out tasks.
read the original abstract
A robust decision-making process must take into account uncertainty, especially when the choice involves inherent risks. Because artificial Intelligence (AI) systems are increasingly integrated into decision-making workflows, managing uncertainty relies more and more on the metacognitive capabilities of these systems; i.e, their ability to assess the reliability of and regulate their own decisions. Hence, it is crucial to employ robust methods to measure the metacognitive abilities of AI. This paper is primarily a methodological contribution arguing for the adoption of the meta-d' framework as the gold standard for assessing the metacognitive sensitivity of AIs--the ability to generate confidence ratings that distinguish correct from incorrect responses. Moreover, we propose to leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions based on uncertainty and risk. To demonstrate the practical utility of these psychophysical frameworks, we conduct two series of experiments on three large language models (LLMs)--GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508. In the first experiments, LLMs performed a primary judgment followed by a confidence rating. In the second, LLMs only performed the primary judgment, while we manipulated the risk associated with either response. On the one hand, applying the meta-d' framework allows us to conduct comparisons along three axes: comparing an LLM to optimality, comparing different LLMs on a given task, and comparing the same LLM across different tasks. On the other hand, SDT allows us to assess whether LLMs become more conservative when risk is high.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a methodological contribution proposing the meta-d' framework from signal detection theory (SDT) as the gold standard for assessing metacognitive sensitivity in AI, defined as the ability to generate confidence ratings that distinguish correct from incorrect responses. It further suggests using SDT to measure the spontaneous regulation of decisions based on uncertainty and risk. Two series of experiments are described on three LLMs (GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508), involving primary judgments with confidence ratings and risk manipulations to demonstrate comparisons to optimality, across models, and across tasks.
Significance. If validated, this approach could provide a standardized, psychophysically grounded method for evaluating metacognition in AI systems, facilitating rigorous comparisons that go beyond simple accuracy metrics. The strength lies in leveraging established frameworks for optimality assessments and risk sensitivity, though the direct transfer from human perceptual tasks to LLM outputs needs empirical support.
major comments (2)
- [Abstract] The abstract describes the intended experiments and claims but provides no numerical results, error bars, task details, or statistical tests, making it impossible to verify the data support for the meta-d' comparisons and SDT regulation claims.
- [Experimental setup (as described)] The core SDT assumptions, including equal-variance Gaussians for confidence ratings, are applied to LLM outputs without any reported validation or checks for normality and variance equality; violation of these would undermine the reliability of meta-d' estimates and the claimed superiority for assessing metacognitive sensitivity.
minor comments (1)
- [Abstract] The model names (e.g., GPT-5, Mistral-Medium-2508) appear to be placeholders or future versions; clarify the exact models used if they are not standard releases.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive comments, which highlight important areas for improving the clarity and rigor of our methodological contribution. We address each major comment point by point below and outline the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract] The abstract describes the intended experiments and claims but provides no numerical results, error bars, task details, or statistical tests, making it impossible to verify the data support for the meta-d' comparisons and SDT regulation claims.
Authors: We agree that the abstract should provide a more complete summary of the empirical findings to allow readers to assess the support for our claims. In the revised version, we will expand the abstract to include key quantitative results (e.g., meta-d' estimates with confidence intervals or standard errors), task specifications, and statistical outcomes from the comparisons across models, tasks, and optimality benchmarks. revision: yes
-
Referee: [Experimental setup (as described)] The core SDT assumptions, including equal-variance Gaussians for confidence ratings, are applied to LLM outputs without any reported validation or checks for normality and variance equality; violation of these would undermine the reliability of meta-d' estimates and the claimed superiority for assessing metacognitive sensitivity.
Authors: We acknowledge that the current manuscript does not report explicit checks for the equal-variance Gaussian assumption underlying meta-d'. In the revision, we will add supplementary analyses examining the distributions of confidence ratings (e.g., normality tests and variance comparisons between correct and incorrect trials) for each model and task. Where the assumptions are approximately met, we will report this support; where deviations occur, we will discuss their potential impact on meta-d' estimates and consider robustness checks or alternative metrics. revision: yes
Circularity Check
No circularity: applies externally established meta-d' and SDT frameworks to LLMs
full rationale
The paper is a methodological proposal that imports the meta-d' framework and signal detection theory (SDT) from psychology to assess LLM metacognition and risk regulation. No equations, predictions, or optimality comparisons are derived from the paper's own data or definitions in a self-referential manner. The experiments apply the pre-existing frameworks directly to LLM confidence ratings and judgments without fitting parameters that are then renamed as predictions, without self-citation load-bearing steps, and without any ansatz or uniqueness claims that reduce to the authors' prior work. The central claims rest on the external validity of SDT assumptions rather than internal construction, making the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The meta-d' framework developed for human metacognition is the appropriate and optimal measure for AI metacognitive sensitivity.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost, washburn_uniqueness_aczel); Foundation/AlexanderDuality.lean (D=3 forcing)reality_from_one_distinction; Jcost uniqueness unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
arguing for the adoption of the meta-d' framework as the gold standard for assessing the metacognitive sensitivity of AIs... leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions based on uncertainty and risk
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.
Reference graph
Works this paper leans on
-
[1]
Makridakis, S.: The forthcoming artificial intelligence (ai) revolution: Its impact on society and firms. Futures90, 46–60 (2017)
work page 2017
-
[2]
arXiv preprint arXiv:2211.06318 (2022)
Stone, P., Brooks, R., Brynjolfsson, E., Calo, R., Etzioni, O., Hager, G., Hirschberg, J., Kalyanakrishnan, S., Kamar, E., Kraus, S., et al.: Artificial intelligence and life in 2030: the one hundred year study on artificial intelligence. arXiv preprint arXiv:2211.06318 (2022)
-
[3]
Gruetzemacher, R., Whittlestone, J.: The transformative potential of artificial intelligence. Futures135, 102884 (2022)
work page 2022
-
[4]
Chinese Journal of Sociology 11(1), 31–57 (2025)
Xie, Y., Avila, S.: The social impact of generative LLM-based AI. Chinese Journal of Sociology 11(1), 31–57 (2025)
work page 2025
-
[5]
Interactive Learning Environments31(7), 4099–4112 (2023)
Hwang, G.-J., Chang, C.-Y.: A review of opportunities and challenges of chatbots in education. Interactive Learning Environments31(7), 4099–4112 (2023)
work page 2023
-
[6]
Advanced Intelligent Systems7(3), 2400429 (2025)
Yigci, D., Eryilmaz, M., Yetisen, A.K., Tasoglu, S., Ozcan, A.: Large language model-based chatbots in higher education. Advanced Intelligent Systems7(3), 2400429 (2025)
work page 2025
-
[7]
Computers in Human Behavior: Artificial Humans1(2), 100022 (2023)
Memarian, B., Doleck, T.: Chatgpt in education: Methods, potentials, and limitations. Computers in Human Behavior: Artificial Humans1(2), 100022 (2023)
work page 2023
-
[8]
Computers in Human Behavior160, 108386 (2024)
Stadler, M., Bannert, M., Sailer, M.: Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior160, 108386 (2024)
work page 2024
-
[9]
nature596(7873), 583–589 (2021)
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvu- nakool, K., Bates, R., ˇZ´ ıdek, A., Potapenko, A.,et al.: Highly accurate protein structure prediction with alphafold. nature596(7873), 583–589 (2021)
work page 2021
-
[10]
Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ˇZ´ ıdek, A., Bates, R., Blackwell, S., Yim, J., et al.: Protein complex prediction with alphafold-multimer. biorxiv, 2021–10 (2021)
work page 2021
-
[11]
arXiv preprint arXiv:2310.09685 (2023)
Winnifrith, A., Outeiral, C., Hie, B.: Generative artificial intelligence for de novo protein design. arXiv preprint arXiv:2310.09685 (2023)
-
[12]
https://openai.com/blog/chatgpt
OpenAI: Introducing ChatGPT. https://openai.com/blog/chatgpt. Accessed: 9 Septembre 2026 (2022)
work page 2026
-
[13]
https://explodingtopics.com/blog/ chatbot-statistics
Topics, E.: 40+ Chatbot Statistics (2025). https://explodingtopics.com/blog/ chatbot-statistics. Accessed 9 September 2025 (2025)
work page 2025
-
[14]
Ethical and social risks of harm from Language Models
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., et al.: Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021) 11
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
Environmental science & technology57(9), 3464–3466 (2023)
Rillig, M.C., ˚Agerstrand, M., Bi, M., Gould, K.A., Sauerland, U.: Risks and benefits of large language models for the environment. Environmental science & technology57(9), 3464–3466 (2023)
work page 2023
-
[16]
Jalil, S., Rafi, S., LaToza, T.D., Moran, K., Lam, W.: Chatgpt and software testing education: Promises & perils. In: 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 4130–4137 (2023). IEEE
work page 2023
-
[17]
JAAOS-Journal of the American Academy of Orthopaedic Surgeons31(23), 1173–1179 (2023)
Massey, P.A., Montgomery, C., Zhang, A.S.: Comparison of chatgpt–3.5, chatgpt-4, and orthopaedic resident performance on orthopaedic assessment examinations. JAAOS-Journal of the American Academy of Orthopaedic Surgeons31(23), 1173–1179 (2023)
work page 2023
-
[18]
Johnson, D., Goodman, R., Patrinely, J., Stone, C., Zimmerman, E., Donald, R., Chang, S., Berkowitz, S., Finn, A., Jahangir, E., et al.: Assessing the accuracy and reliability of ai-generated medical responses: an evaluation of the chat-gpt model. Research square, 3 (2023)
work page 2023
-
[19]
Journal of medical education and curricular development11, 23821205241238641 (2024)
Sumbal, A., Sumbal, R., Amir, A.: Can chatgpt-3.5 pass a medical exam? a systematic review of chatgpt’s performance in academic testing. Journal of medical education and curricular development11, 23821205241238641 (2024)
work page 2024
-
[20]
Geerling, W., Mateer, G.D., Wooten, J., Damodaran, N.: Chatgpt has aced the test of understanding in college economics: Now what? The American Economist68(2), 233–245 (2023)
work page 2023
-
[21]
Proceedings of the National Academy of Sciences121(49), 2414955121 (2024)
Borges, B., Foroutan, N., Bayazit, D., Sotnikova, A., Montariol, S., Nazaretsky, T., Banaei, M., Sakhaeirad, A., Servant, P., Neshaei, S.P.,et al.: Could chatgpt get an engineering degree? eval- uating higher education vulnerability to AI assistants. Proceedings of the National Academy of Sciences121(49), 2414955121 (2024)
work page 2024
-
[22]
W´ ojcik, D., Adamiak, O., Czerepak, G., Tokarczuk, O., Szalewski, L.: A comparative analysis of the performance of chatgpt4, gemini and claude for the polish medical final diploma exam and medical-dental verification exam. MedRxiv, 2024–07 (2024)
work page 2024
-
[23]
Fijaˇ cko, N., Gosak, L., ˇStiglic, G., Picard, C.T., Douma, M.J.: Can chatgpt pass the life support exams without entering the american heart association course? Resuscitation185 (2023)
work page 2023
-
[24]
Winter, J.C.: Can chatgpt pass high school exams on english language comprehension? International Journal of Artificial Intelligence in Education34(3), 915–930 (2024)
work page 2024
-
[25]
Drug, healthcare and patient safety, 137–147 (2023)
Al-Ashwal, F.Y., Zawiah, M., Gharaibeh, L., Abu-Farha, R., Bitar, A.N.: Evaluating the sensitivity, specificity, and accuracy of chatgpt-3.5, chatgpt-4, bing ai, and bard against con- ventional drug-drug interactions clinical tools. Drug, healthcare and patient safety, 137–147 (2023)
work page 2023
-
[26]
Abbas, A., Rehman, M.S., Rehman, S.S.: Comparing the performance of popular large lan- guage models on the national board of medical examiners sample questions. Cureus16(3) (2024)
work page 2024
-
[27]
arXiv preprint arXiv:2404.18416 (2024)
Saab, K., Tu, T., Weng, W.-H., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother, T., Park, C., Vedadi, E., et al.: Capabilities of gemini models in medicine. arXiv preprint arXiv:2404.18416 (2024)
-
[28]
Gencer, A., Aydin, S.: Can chatgpt pass the thoracic surgery exam? The American Journal of the Medical Sciences366(4), 291–295 (2023)
work page 2023
-
[29]
Public Library of Science San 12 Francisco, CA USA (2023)
Mbakwe, A.B., Lourentzou, I., Celi, L.A., Mechanic, O.J., Dagan, A.: ChatGPT passing USMLE shines a spotlight on the flaws of medical education. Public Library of Science San 12 Francisco, CA USA (2023)
work page 2023
-
[30]
SAE international, Warrendale, PA, USA (2021)
Committee, O.-R.A.D.O.: Taxonomy and Definitions for Terms Related to Driving Automa- tion Systems for On-road Motor Vehicles. SAE international, Warrendale, PA, USA (2021). Accessed: 25 February 2026.https://www.sae.org/standards/j3016˙202104-taxonomy- definitions-terms-related-driving-automation-systems-road-motor-vehicles
work page 2021
-
[31]
Proceedings of the National Academy of Sciences119(11), 2111547119 (2022)
Steyvers, M., Tejeda, H., Kerrigan, G., Smyth, P.: Bayesian modeling of human–AI comple- mentarity. Proceedings of the National Academy of Sciences119(11), 2111547119 (2022)
work page 2022
-
[32]
Diagnostics15(22), 2899 (2025)
Castilla, A.C., D’Amorim, I.d.P., Wanderley, M.F.B., Esmeraldo, M.A., Yoshida, A.R., Eigier, A.M., Valente Yamada Sawamura, M.: External validation of an artificial intelligence triaging system for chest x-rays: A retrospective independent clinical study. Diagnostics15(22), 2899 (2025)
work page 2025
-
[33]
Frontiers in Computer Science6, 1521066 (2025)
Gomez, C., Cho, S.M., Ke, S., Huang, C.-M., Unberath, M.: Human-AI collaboration is not very collaborative yet: a taxonomy of interaction patterns in AI-assisted decision making from a systematic review. Frontiers in Computer Science6, 1521066 (2025)
work page 2025
-
[34]
Nature neuroscience11(4), 398–403 (2008)
Platt, M.L., Huettel, S.A.: Risky business: the neuroeconomics of decision making under uncertainty. Nature neuroscience11(4), 398–403 (2008)
work page 2008
-
[35]
arXiv preprint arXiv:2507.22365 (2025)
Li, Z., Steyvers, M.: Beyond accuracy: How AI metacognitive sensitivity improves AI-assisted decision making. arXiv preprint arXiv:2507.22365 (2025)
-
[36]
Berger-Tal, O., Nathan, J., Meron, E., Saltz, D.: The exploration-exploitation dilemma: a multidisciplinary framework. PloS one9(4), 95693 (2014)
work page 2014
-
[37]
Mehlhorn, K., Newell, B.R., Todd, P.M., Lee, M.D., Morgan, K., Braithwaite, V.A., Haus- mann, D., Fiedler, K., Gonzalez, C.: Unpacking the exploration–exploitation tradeoff: a synthesis of human and animal literatures. Decision2(3), 191 (2015)
work page 2015
-
[38]
Science329(5995), 1081–1085 (2010)
Bahrami, B., Olsen, K., Latham, P.E., Roepstorff, A., Rees, G., Frith, C.D.: Optimally interacting minds. Science329(5995), 1081–1085 (2010)
work page 2010
- [39]
-
[40]
In: AAAI Spring Symposium Series (2024)
Bhattacharyya, R., Nguyen, D.A., Colombatto, C., Fleming, S., Posner, I., Hawes, N.: Towards intelligent decision support systems in robotics: Investigating the role of self-confidence calibration in joint decision-making. In: AAAI Spring Symposium Series (2024)
work page 2024
-
[41]
Koriat, A.: When are two heads better than one and why? Science336(6079), 360–362 (2012)
work page 2012
-
[42]
In: Theory of Games and Economic Behavior
Von Neumann, J., Morgenstern, O.: Theory of games and economic behavior. In: Theory of Games and Economic Behavior. Princeton university press, Princeton, NJ (2007)
work page 2007
-
[43]
MIT Press, Cambridge, MA (2022)
Parr, T., Pezzulo, G., Friston, K.J.: Active Inference: the Free Energy Principle in Mind, Brain, and Behavior. MIT Press, Cambridge, MA (2022)
work page 2022
-
[44]
Decision Support Systems124, 113097 (2019)
Vo, N.N., He, X., Liu, S., Xu, G.: Deep learning for decision making and the optimization of socially responsible investments and portfolio. Decision Support Systems124, 113097 (2019)
work page 2019
-
[45]
arXiv preprint arXiv:2510.05126 (2025)
Steyvers, M., Belem, C., Smyth, P.: Improving metacognition and uncertainty communication in language models. arXiv preprint arXiv:2510.05126 (2025)
-
[46]
Nature Machine Intelligence7(2), 221–231 (2025)
Steyvers, M., Tejeda, H., Kumar, A., Belem, C., Karny, S., Hu, X., Mayer, L.W., Smyth, 13 P.: What large language models know and what people think they know. Nature Machine Intelligence7(2), 221–231 (2025)
work page 2025
-
[47]
Current Directions in Psychological Science, 09637214251391158 (2025)
Steyvers, M., Peters, M.A.: Metacognition and uncertainty communication in humans and large language models. Current Directions in Psychological Science, 09637214251391158 (2025)
work page 2025
-
[48]
Lee, D., Pruitt, J., Zhou, T., Du, J., Odegaard, B.: Metacognitive sensitivity: The key to calibrating trust and optimal decision making with ai. PNAS nexus4(5), 133 (2025)
work page 2025
-
[49]
Neuroscience of consciousness2017(1), 007 (2017)
Fleming, S.M.: Hmeta-d: hierarchical bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of consciousness2017(1), 007 (2017)
work page 2017
-
[50]
In: The Cognitive Neuroscience of Metacognition, pp
Maniscalco, B., Lau, H.: Signal detection theory analysis of type 1 and type 2 data: meta- d′, response-specific meta-d ′, and the unequal variance SDT model. In: The Cognitive Neuroscience of Metacognition, pp. 25–66. Springer, Berlin, Heidelberg (2014)
work page 2014
-
[51]
Psychonomic bulletin & review10(4), 843–876 (2003)
Galvin, S.J., Podd, J.V., Drga, V., Whitmore, J.: Type 2 tasks in the theory of signal detectability: Discrimination between correct and incorrect decisions. Psychonomic bulletin & review10(4), 843–876 (2003)
work page 2003
-
[52]
arXiv preprint arXiv:2509.21545 (2025)
Ackerman, C.: Evidence for limited metacognition in LLMs. arXiv preprint arXiv:2509.21545 (2025)
-
[53]
Memory & Cognition, 1–26 (2025)
Cash, T.N., Oppenheimer, D.M., Christie, S., Devgan, M.: Quantifying uncert-AI-nty: Testing the accuracy of LLMs’ confidence judgments. Memory & Cognition, 1–26 (2025)
work page 2025
-
[54]
Consciousness and cognition21(1), 422–430 (2012)
Maniscalco, B., Lau, H.: A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and cognition21(1), 422–430 (2012)
work page 2012
-
[55]
Nature Communications16(1), 701 (2025)
Rahnev, D.: A comprehensive assessment of current methods for measuring metacognition. Nature Communications16(1), 701 (2025)
work page 2025
-
[56]
Neuroscience of Consciousness2020(1), 001 (2020)
Mazancieux, A., Dinze, C., Souchay, C., Moulin, C.J.: Metacognitive domain specificity in feeling-of-knowing but not retrospective confidence. Neuroscience of Consciousness2020(1), 001 (2020)
work page 2020
-
[57]
Consciousness and cognition35, 192–205 (2015)
Rausch, M., M¨ uller, H.J., Zehetleitner, M.: Metacognitive sensitivity of subjective reports of decisional confidence and visual experience. Consciousness and cognition35, 192–205 (2015)
work page 2015
-
[58]
Journal of Experi- mental Psychology: General148(1), 51 (2019)
Carpenter, J., Sherman, M.T., Kievit, R.A., Seth, A.K., Lau, H., Fleming, S.M.: Domain- general enhancements of metacognitive ability through adaptive training. Journal of Experi- mental Psychology: General148(1), 51 (2019)
work page 2019
-
[59]
Consciousness and Cognition111, 103522 (2023)
Conte, N., Fairfield, B., Padulo, C., Pelegrina, S.: Metacognition in working memory: Confidence judgments during an n-back task. Consciousness and Cognition111, 103522 (2023)
work page 2023
-
[60]
Wen, W., Charles, L., Haggard, P.: Metacognition and sense of agency. Cognition241, 105622 (2023)
work page 2023
-
[61]
Meunier-Duperray, L., Mazancieux, A., Souchay, C., Fleming, S.M., Bastin, C., Moulin, C.J., Angel, L.: Does age affect metacognition? a cross-domain investigation using a hierarchical bayesian framework. Cognition258, 106089 (2025)
work page 2025
-
[62]
Frontiers in psychology12, 630143 (2021) 14
Zakrzewski, A.C., Sanders, E.C., Berry, J.M.: Evidence for age-equivalent and task-dissociative metacognition in the memory domain. Frontiers in psychology12, 630143 (2021) 14
work page 2021
-
[63]
Consciousness and cognition28, 151–160 (2014)
Palmer, E.C., David, A.S., Fleming, S.M.: Effects of age on metacognitive efficiency. Consciousness and cognition28, 151–160 (2014)
work page 2014
-
[64]
Annual Review of Psychology75(1), 241–268 (2024)
Fleming, S.M.: Metacognition and confidence: A review and synthesis. Annual Review of Psychology75(1), 241–268 (2024)
work page 2024
-
[65]
Neuroscience of consciousness2021(1), 040 (2021)
Guggenmos, M.: Measuring metacognitive performance: type 1 performance dependence and test-retest reliability. Neuroscience of consciousness2021(1), 040 (2021)
work page 2021
-
[66]
Consciousness and Cognition95, 103196 (2021)
Xue, K., Shekhar, M., Rahnev, D.: Examining the robustness of the relationship between metacognitive efficiency and metacognitive bias. Consciousness and Cognition95, 103196 (2021)
work page 2021
-
[67]
Rausch, M., Hellmann, S., Zehetleitner, M.: Measures of metacognitive efficiency across cognitive models of decision confidence. Psychological Methods (2023)
work page 2023
-
[68]
Miyoshi, K., Rahnev, D., Lau, H.: Correcting for unequal variance in signal detection models using response time. iScience (2026)
work page 2026
-
[69]
Dayan, P.: Metacognitive information theory. Open Mind7, 392–411 (2023)
work page 2023
-
[70]
arXiv preprint arXiv:2512.10451 (2025)
Trinh, L.T.M., Pham, L.M.V., Pham, T.M.A., Nguyen, A.D.: Metacognitive sensitivity for test-time dynamic model selection. arXiv preprint arXiv:2512.10451 (2025)
-
[71]
arXiv preprint arXiv:2603.09309 (2026)
Dai, Y.: Rescaling confidence: What scale design reveals about LLM metacognition. arXiv preprint arXiv:2603.09309 (2026)
-
[72]
In: Proceedings of the AAAI Conference on Artificial Intelligence, vol
Wang, G., Wu, W., Ye, G., Cheng, Z., Chen, X., Zheng, H.: Decoupling metacognition from cognition: A framework for quantifying metacognitive ability in LLMs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 25353–25361 (2025)
work page 2025
-
[73]
https://developers.openai.com/api/reference/resources/ chat/subresources/completions/methods/create
OpenAI: Create chat completion. https://developers.openai.com/api/reference/resources/ chat/subresources/completions/methods/create. Accessed: 27 March 2026 (2026)
work page 2026
-
[74]
https://api-docs.deepseek.com/ updates
DeepSeek: DeepSeek API Documentation: Change Log. https://api-docs.deepseek.com/ updates. Accessed: 24 February 2026
work page 2026
-
[75]
Green, D.M., Swets, J.A.,et al.: Signal Detection Theory and Psychophysics vol. 1. Wiley New York, New York (1966)
work page 1966
-
[76]
Hautus, M.J., Macmillan, N.A., Creelman, C.D.: Detection Theory: A User’s Guide. Rout- ledge, ??? (2021)
work page 2021
-
[77]
Oxford university press, ??? (2001)
Wickens, T.D.: Elementary Signal Detection Theory. Oxford university press, ??? (2001)
work page 2001
-
[78]
Journal of open source software4(40), 1541 (2019)
Makowski, D., Ben-Shachar, M.S., L¨ udecke, D.: bayestestr: Describing effects and their uncer- tainty, existence and significance within the bayesian framework. Journal of open source software4(40), 1541 (2019)
work page 2019
-
[79]
Vision research40(22), 3121–3144 (2000)
Tyler, C.W., Chen, C.-C.: Signal detection theory in the 2afc paradigm: Attention, channel uncertainty and probability summation. Vision research40(22), 3121–3144 (2000)
work page 2000
-
[80]
Journal of Vision22(10), 18–18 (2022)
Miyoshi, K., Sakamoto, Y., Nishida, S.: On the assumptions behind metacognitive measure- ments: Implications for theory and practice. Journal of Vision22(10), 18–18 (2022)
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.