Data Aphasia: An Institutional Counterfactual Study of the Stability of Academic Cognition Under Letter-Grade Evaluation Systems
Pith reviewed 2026-06-27 05:43 UTC · model grok-4.3
The pith
Letter-grade conversion induces data aphasia that makes academic structures unstable to single-student changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the letter-grade system causes data aphasia through discretization that compresses the feature space nineteenfold, flattens density gradients, and creates pseudo-heterogeneity regions, rendering clustering boundaries highly sensitive to minor perturbations. Under the full sample the system appears stable at K=4, but exclusion of one extreme anchor student raises optimal K to 8 and drops individual diagnostic identity consistency from 95 percent to 62 percent, while temporal consistency fluctuates between 52 percent and 96 percent against the percentage system's 93-96 percent baseline.
What carries the argument
Institutional counterfactual simulation that converts percentage scores to A/B/C/D letter grades and compares information entropy, optimal cluster number K, and diagnostic identity consistency before and after conversion.
If this is right
- Information entropy decreases by approximately 69 percent after conversion to letter grades.
- Letter-grade clustering appears stable at K=4 in full samples but becomes unstable upon removal of one extreme anchor student.
- Individual diagnostic identity consistency falls from 95 percent to 62 percent when an anchor student is excluded.
- Temporal consistency of diagnostic identities ranges 52-96 percent, below the 93-96 percent baseline of the percentage system.
- Discretization compresses the feature space nineteenfold and generates pseudo-heterogeneity regions that flatten density gradients.
Where Pith is reading between the lines
- Education systems using only letter grades may systematically mis-track student progress patterns compared with percentage-based records.
- Policy decisions about interventions or grouping could rest on groupings that shift with small data changes.
- A dual-track system retaining both letter and percentage data might preserve diagnostic stability while meeting simplification goals.
- Similar sensitivity effects could appear in non-mathematics subjects or larger age groups if the discretization mechanism is general.
Load-bearing premise
That the clustering procedure and definition of diagnostic identity consistency on discretized grades capture stable academic structures rather than artifacts of the discretization itself.
What would settle it
A replication dataset in which removing any single student leaves optimal K unchanged at 4 and keeps diagnostic identity consistency above 90 percent would falsify the claimed instability.
read the original abstract
Does the letter-grade evaluation system, while achieving its burden-reduction goals, affect the education system's stable understanding of students' academic structures? This paper introduces the concept of "data aphasia," referring to restrictions on diagnostic information expression caused by institutionally mandated forms of data presentation. Using data from 68 mathematics examinations administered to 75 primary school students, we employ an institutional counterfactual simulation method to convert percentage scores into A/B/C/D letter grades and conduct systematic tests at the information, structural, and diagnostic levels. Results show that information entropy decreases by approximately 69% after grade conversion; under the full sample, the letter-grade system appears superficially stable (K=4), but removing a single extreme anchor student causes the optimal K to increase from 4 to 8 and individual diagnostic identity consistency to fall from 95% to 62%; temporal consistency fluctuates between 52% and 96%, far below the 93%-96% baseline of the percentage system. Mechanism analysis indicates that discretization compresses the feature space by approximately nineteenfold across 68 examinations; after standardization, it creates extensive pseudo-heterogeneity regions, flattens density gradients, and makes clustering boundaries highly sensitive to minor perturbations. Based on these findings, this paper proposes a dual-track evaluation mechanism and provides a testable analytical framework for understanding the cognitive costs of educational evaluation reform.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that letter-grade evaluation systems induce 'data aphasia' by restricting diagnostic information expression. Using an institutional counterfactual simulation on 68 mathematics examinations administered to 75 primary school students, percentage scores are converted to A/B/C/D grades. This yields a ~69% drop in information entropy; the full sample appears stable (optimal K=4) but removal of one extreme anchor student shifts optimal K to 8 and drops individual diagnostic identity consistency from 95% to 62%; temporal consistency fluctuates 52-96% versus a 93-96% baseline for percentages. Mechanism analysis attributes this to ~19-fold feature-space compression, pseudo-heterogeneity regions, and flattened density gradients that make boundaries sensitive to perturbations. The paper proposes a dual-track evaluation mechanism and a testable framework for cognitive costs of grading reform.
Significance. If the quantitative instability claims prove robust once methods are fully specified, the work could offer empirical support for information-loss effects of discretized grading and a simulation-based framework for evaluating educational reforms. The use of real examination data in a counterfactual design is a methodological strength that distinguishes it from purely theoretical critiques of grading systems.
major comments (4)
- [Abstract] Abstract: The clustering algorithm (e.g., k-means, hierarchical), distance metric, and optimal-K selection criterion are not stated. These details are load-bearing for the central claims that optimal K shifts from 4 to 8 and consistency falls from 95% to 62% after removal of one anchor student.
- [Abstract] Abstract (mechanism analysis paragraph): No equations, formulas, or step-by-step calculations are supplied for the reported 69% entropy decrease or the nineteenfold feature-space compression. These quantities cannot be reproduced from the given text.
- [Abstract] Abstract: The precise definition and computational formula for 'individual diagnostic identity consistency' and 'temporal consistency' are omitted. Without them it is impossible to assess whether the reported drops (95%→62%, 52-96% range) are intrinsic to letter-grade discretization or artifacts of the chosen metric and discretization boundaries.
- [Abstract] Abstract: The post-hoc removal of a single extreme anchor student is presented as decisive evidence of instability, yet no justification, pre-specified rule, or sensitivity analysis for this removal is provided. This step directly supports the headline contrast between the two grading systems.
minor comments (1)
- [Abstract] The term 'data aphasia' is introduced without reference to existing literature on information loss, discretization effects, or related concepts in educational measurement or data science.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. The comments correctly identify areas where the abstract lacks necessary methodological transparency. We address each point below and will revise the abstract and relevant sections accordingly to improve reproducibility.
read point-by-point responses
-
Referee: [Abstract] Abstract: The clustering algorithm (e.g., k-means, hierarchical), distance metric, and optimal-K selection criterion are not stated. These details are load-bearing for the central claims that optimal K shifts from 4 to 8 and consistency falls from 95% to 62% after removal of one anchor student.
Authors: We agree these details must be stated explicitly. The manuscript employs k-means clustering with Euclidean distance and selects optimal K via the silhouette coefficient. We will add this specification to the abstract in the revised version. revision: yes
-
Referee: [Abstract] Abstract (mechanism analysis paragraph): No equations, formulas, or step-by-step calculations are supplied for the reported 69% entropy decrease or the nineteenfold feature-space compression. These quantities cannot be reproduced from the given text.
Authors: The referee is correct that the abstract omits the formulas. The entropy reduction is computed as 1 - (H_letter-grades / H_percentages) using Shannon entropy on the discretized versus continuous score distributions across the 68 examinations; the nineteenfold compression is the ratio of distinct possible values (101 percentages versus 4 letter grades) averaged over the feature space. We will incorporate the equations and a brief derivation into the abstract. revision: yes
-
Referee: [Abstract] Abstract: The precise definition and computational formula for 'individual diagnostic identity consistency' and 'temporal consistency' are omitted. Without them it is impossible to assess whether the reported drops (95%→62%, 52-96% range) are intrinsic to letter-grade discretization or artifacts of the chosen metric and discretization boundaries.
Authors: We accept this criticism. Individual diagnostic identity consistency is the percentage of students whose cluster assignment remains unchanged across bootstrap resamples of the data; temporal consistency is the fraction of students retaining the same cluster label between consecutive examinations. Both are computed after optimal-K selection. We will add these definitions and formulas to the abstract. revision: yes
-
Referee: [Abstract] Abstract: The post-hoc removal of a single extreme anchor student is presented as decisive evidence of instability, yet no justification, pre-specified rule, or sensitivity analysis for this removal is provided. This step directly supports the headline contrast between the two grading systems.
Authors: The comment is valid; the abstract presents the removal without sufficient justification or pre-specification. In the full manuscript this is framed as an illustrative sensitivity check, but we agree a pre-specified rule (e.g., removal of the single highest and lowest scoring students) and a broader sensitivity analysis across multiple candidates must be added. We will revise the abstract and methods to include these elements. revision: yes
Circularity Check
No circularity: empirical clustering results on external exam data are independent of the instability claims
full rationale
The paper applies a counterfactual conversion of percentage scores to letter grades on 68 real examinations from 75 students, then reports entropy drop, optimal cluster count K, and consistency metrics computed on that transformed data. No step reduces the reported instability (K shift or consistency drop) to a definition or fit that presupposes the result; the percentage-system baseline provides an external comparator, and the mechanism analysis (feature-space compression, density flattening) follows directly from the discretization without self-referential closure. The derivation chain remains self-contained against the input examination records.
Axiom & Free-Parameter Ledger
free parameters (2)
- letter-grade boundaries
- optimal K in clustering
axioms (2)
- domain assumption Percentage scores faithfully represent underlying academic structures without measurement error beyond the grading conversion
- domain assumption Clustering on grade vectors captures diagnostic identity
invented entities (1)
-
data aphasia
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Opinions on Further Reducing the Homework Burden and Off-Campus Training Burden of Students in Compulsory Education [J]
[1] General Office of the CPC Central Committee; General Office of the State Council. Opinions on Further Reducing the Homework Burden and Off-Campus Training Burden of Students in Compulsory Education [J]. Gazette of the State Council of the People's Republic of China, 2021(22): 14–19. (in Chinese)
2021
-
[2]
Notice on Further Strengthening the Management of Daily Examinations in Primary and Secondary Schools [Z]
[2] General Office of the Ministry of Education. Notice on Further Strengthening the Management of Daily Examinations in Primary and Secondary Schools [Z]. Department of Basic Education〔2025〕No. 3, 2025-12-
2025
-
[3]
(in Chinese)
http://www.moe.gov.cn/srcsite/A06/s3321/202512/t20251216_1423634.html. (in Chinese)
-
[4]
[3] Department of Education of Anhui Province. Implementation Opinions on Further Standardizing Examination Management in Compulsory Education Schools [Z]. Wan Jiao Ji〔 2021〕 No. 17, 2021-10-22. https://jyt.ah.gov.cn/ztzl/sjgzzxd/zcwj/40490416.html. (in Chinese)
arXiv 2021
-
[5]
Four key links in deepening the reform of educational evaluation [J]
[4] Xin T. Four key links in deepening the reform of educational evaluation [J]. China Examinations, 2023(10): 1–8. DOI: 10.19360/j.cnki.11-3303/g4.2023.10.001. (in Chinese)
-
[6]
Supervised and unsupervised discretization of continuous features [M]// Machine Learning Proceedings 1995
[5] Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretization of continuous features [M]// Machine Learning Proceedings 1995. Morgan Kaufmann, 1995: 194–202
1995
-
[7]
A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning [J]
[6] Garcia S, Luengo J, Sáez J A, et al. A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 25(4): 734– 750.DOI:10.1109/TKDE.2012.35
-
[8]
Finding Groups in Data: An Introduction to Cluster Analysis [M]
[7] Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis [M]. John Wiley & Sons, 2009
2009
-
[9]
Modern Multidimensional Scaling: Theory and Applications [M]
[8] Borg I, Groenen P J F. Modern Multidimensional Scaling: Theory and Applications [M]. New York, NY: Springer New York, 2005
2005
-
[10]
Applied Multivariate Statistical Analysis [M]
[9] Johnson R A, Wichern D W. Applied Multivariate Statistical Analysis [M]. 2002.https://doi.org/10.1007/978-3-031-63833-6
-
[11]
A study on the identification of gifted and talented students: A case study of the curriculum experimental class at Wenlai Junior High School in Shanghai [J]
[10] Xiang R F, Bai B, Liu S X. A study on the identification of gifted and talented students: A case study of the curriculum experimental class at Wenlai Junior High School in Shanghai [J]. Research in Educational Development, 2016, 36(2): 49–53. (in Chinese)
2016
-
[12]
implementing letter-grade evaluation for examinations
[11] Cheng J J, Zhou X J. The implications of "implementing letter-grade evaluation for examinations" in compulsory education [J]. Journal of Gannan Normal University, 2023, 44(2): 94 – 100. DOI: 10.13698/j.cnki.cn36-1346/c.2023.02.016. (in Chinese)
-
[13]
The replacement of this "ruler" is of great significance [N/OL]
[12] Guangming Online Commentator. The replacement of this "ruler" is of great significance [N/OL]. Guangming Online, 2025-12-19. http://views.ce.cn/view/ent/202512/t20251219_2651262.shtml. (in Chinese)
2025
-
[14]
[13] Zhang X F. Examining the phenomenon of educational utilitarianism: A perspective of instrumental rationality [J]. Research in Educational Development, 2008(21): 26 – 28. DOI: 10.14121/j.cnki.1008- 3855.2008.21.005. (in Chinese)
-
[15]
[14] Ma Y X. Is the letter-grade system versus the 100-point system the criterion for distinguishing quality- oriented education evaluation? — Also on the essence of evaluation system reform [J]. Education Science, 1998(4): 35–37. (in Chinese)
1998
-
[16]
Learning analytics: The emergence of a discipline [J]
[15] Siemens G. Learning analytics: The emergence of a discipline [J]. American Behavioral Scientist, 2013, 57(10): 1380–1400. https://doi.org/10.1177/0002764213498851
-
[17]
Focus on formative feedback [J]
[16] Shute V J. Focus on formative feedback [J]. Review of Educational Research, 2008, 78(1): 153 – 189. https://doi.org/10.3102/0034654307313795
-
[18]
Analytics 2.0 for precision education [J]
[17] Wu J Y, Yang C C Y, Liao C H, et al. Analytics 2.0 for precision education [J]. Educational Technology & Society, 2021, 24(1): 267–279. https://www.jstor.org/stable/26977872
arXiv 2021
-
[19]
Research progress on cognitive tracking models in educational big data [J]
[18] Hu X G, Liu F, Bu C Y. Research progress on cognitive tracking models in educational big data [J]. Journal of Computer Research and Development, 2020, 57(12): 2523–2546. (in Chinese)
2020
-
[20]
Educational data mining and learning analytics: An updated survey [J]
[19] Romero C, Ventura S. Educational data mining and learning analytics: An updated survey [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2020, 10(3): e1355. https://doi.org/10.1002/widm.1355
-
[21]
Learning analytics: Mining the value of educational data in the era of big data [J]
[20] Wei S P. Learning analytics: Mining the value of educational data in the era of big data [J]. Modern Educational Technology, 2013(2): 5–11. (in Chinese)
2013
-
[22]
Sorting Things Out: Classification and Its Consequences [M]
[21] Bowker G C, Star S L. Sorting Things Out: Classification and Its Consequences [M]. MIT Press, 2000
2000
-
[23]
Looking beyond learning: Notes towards the critical study of educational technology [J]
[22] Selwyn N. Looking beyond learning: Notes towards the critical study of educational technology [J]. Journal of Computer Assisted Learning, 2010, 26(1): 65–73. https://doi.org/10.1111/j.1365-2729.2009.00338.x
-
[24]
The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences [M]
[23] Kitchin R. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences [M]. Sage, 2014
2014
-
[25]
[24] Putnick D L, Bornstein M H. Measurement invariance conventions and reporting: The state of the art and future directions for psychological research [J]. Developmental Review, 2016, 41: 71 – 90. https://doi.org/10.1016/j.dr.2016.06.004
-
[26]
[25] Vandenberg R J, Lance C E. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research [J]. Organizational Research Methods, 2000, 3(1): 4–70. https://doi.org/10.1177/109442810031002
-
[27]
Measurement invariance, factor analysis and factorial invariance [J]
[26] Meredith W. Measurement invariance, factor analysis and factorial invariance [J]. Psychometrika, 1993, 58(4): 525–543. https://doi.org/10.1007/BF02294825
-
[28]
Statistical Approaches to Measurement Invariance [M]
[27] Millsap R E. Statistical Approaches to Measurement Invariance [M]. Routledge, 2012
2012
-
[29]
The cost of dichotomising continuous variables [J]
[28] Altman D G, Royston P. The cost of dichotomising continuous variables [J]. BMJ, 2006, 332(7549): 1080. https://doi.org/10.1136/bmj.332.7549.1080
-
[30]
[29] Hacking I. The looping effects of human kinds [M]// Causal Cognition: A Multidisciplinary Debate. Oxford University Press, 1995: 351–383. https://doi.org/10.1093/acprof:oso/9780198524021.003.0012
work page doi:10.1093/acprof:oso/9780198524021.003.0012 1995
-
[31]
Statistical Analysis with Missing Data [M]
[30] Little R J A, Rubin D B. Statistical Analysis with Missing Data [M]. 3rd ed. John Wiley & Sons, 2019
2019
-
[32]
Routledge International Handbook of Ignorance Studies [M]
[31] Gross M, McGoey L, eds. Routledge International Handbook of Ignorance Studies [M]. London: Routledge, 2015
2015
-
[33]
Institutions and Organizations: Ideas, Interests, and Identities [M]
[32] Scott W R. Institutions and Organizations: Ideas, Interests, and Identities [M]. 4th ed. Sage Publications, 2013
2013
-
[34]
On the practice of dichotomization of quantitative variables [J]
[33] MacCallum R C, Zhang S, Preacher K J, et al. On the practice of dichotomization of quantitative variables [J]. Psychological Methods, 2002, 7(1): 19–40. DOI: 10.1037/1082-989X.7.1.19
-
[35]
Developing the theory of formative assessment [J]
[34] Black P, Wiliam D. Developing the theory of formative assessment [J]. Educational Assessment, Evaluation and Accountability, 2009, 21(1): 5–31. https://doi.org/10.1007/s11092-008-9068-5
-
[36]
How Institutions Think [M]
[35] Douglas M. How Institutions Think [M]. Syracuse University Press, 1986
1986
-
[37]
Overall Plan for Deepening the Reform of Educational Evaluation in the New Era [Z]
[36] The CPC Central Committee; The State Council. Overall Plan for Deepening the Reform of Educational Evaluation in the New Era [Z]. 2020-10-13. (in Chinese)
2020
-
[38]
Guidelines for Evaluating the Quality of Compulsory Education [Z]
[37] Ministry of Education; Organization Department of the CPC Central Committee; Office of the Central Establishment Committee, et al. Guidelines for Evaluating the Quality of Compulsory Education [Z]. 2021- 03-04. (in Chinese)
2021
-
[39]
Notice on Strengthening Examination Management in Compulsory Education Schools [Z]
[38] General Office of the Ministry of Education. Notice on Strengthening Examination Management in Compulsory Education Schools [Z]. 2021-08-30. http://www.moe.gov.cn/jyb_xwfb/gzdt_gzdt/s5987/202108/t20210831_556381.html. (in Chinese)
2021
-
[40]
[39] Shannon C E. A mathematical theory of communication [J]. The Bell System Technical Journal, 1948, 27(3): 379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x
-
[41]
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis [J]
[40] Rousseeuw P J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis [J]. Journal of Computational and Applied Mathematics, 1987, 20: 53 – 65. https://doi.org/10.1016/0377- 0427(87)90125-7
-
[42]
[41] Jolliffe I T, Cadima J. Principal component analysis: A review and recent developments [J]. Philosophical Transactions of the Royal Society A, 2016, 374(2065): 20150202. https://doi.org/10.1098/rsta.2015.0202
-
[43]
Some methods of classification and analysis of multivariate observations [C]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability
[42] McQueen J B. Some methods of classification and analysis of multivariate observations [C]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967: 281–297
1967
-
[44]
Time-series clustering — A decade review [J]
[43] Aghabozorgi S, Shirkhorshidi A S, Wah T Y. Time-series clustering — A decade review [J]. Information Systems, 2015, 53: 16–38. https://doi.org/10.1016/j.is.2015.04.007
-
[45]
Guiding Opinions on Strengthening Examination Management and Teaching Quality Evaluation in Compulsory Education Schools [Z]
[44] Education and Sports Bureau of Menghai County. Guiding Opinions on Strengthening Examination Management and Teaching Quality Evaluation in Compulsory Education Schools [Z]. 2025-01-06. (in Chinese)
2025
-
[46]
A cluster separation measure [J]
[45] Davies D L, Bouldin D W. A cluster separation measure [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979(2): 224–227. DOI: 10.1109/TPAMI.1979.4766909
-
[47]
Proceedings of the 26th Annual International Conference on Machine Learning , pages =
[46] Vinh N X, Epps J, Bailey J. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? [C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 1073–1080. https://doi.org/10.1145/1553374.1553511
-
[48]
[47] Ben-David S, Von Luxburg U, Pál D. A sober look at clustering stability [C]// International Conference on Computational Learning Theory. Berlin, Heidelberg: Springer, 2006: 5 – 19. https://doi.org/10.1007/11776420_4
-
[49]
The unanticipated consequences of purposive social action [J]
[48] Merton R K. The unanticipated consequences of purposive social action [J]. American Sociological Review, 1936, 1(6): 894–904. https://doi.org/10.2307/2084615
-
[50]
Abandoning score worship and returning to the original aspiration of educating people [EB/OL]
[49] Zhang Z Y. Abandoning score worship and returning to the original aspiration of educating people [EB/OL]. (2021-08-31) [2026-06-09]. http://www.moe.gov.cn/jyb_xwfb/moe_2082/2021/2021_zl54/202108/t20210831_556486.html. (in Chinese)
2021
-
[51]
Double Reduction
[50] Yang Y M. Achievements, problems, and countermeasures of the "Double Reduction" policy implementation [J]. Democracy and Science, 2022(5): 61–64. (in Chinese)
2022
-
[52]
Double Reduction
[51] Sun D. An assessment of the effectiveness of the "Double Reduction" policy: An empirical study based on internet big data [J]. Future and Development, 2026, 50(1): 128–133. (in Chinese)
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.