Recognition: 2 theorem links
· Lean TheoremUsing Computational Physics Essays to Facilitate Engineering Students' Computational Thinking
Pith reviewed 2026-05-11 01:13 UTC · model grok-4.3
The pith
Computational Physics Essays elicit high variety of computational thinking practices in engineering students, with strong correlation to overall quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the Computational Physics Essay, administered as a culminating capstone requiring Python-based modeling of real physics systems, successfully elicits a high variety of computational thinking practices per Weintrop's taxonomy. Students achieve 99 percent proficiency in investigating complex systems as a whole, CT practices correlate strongly with expert-rated quality, and the approach shifts students' epistemic frame toward physical sensemaking despite expected novice limitations in modularity.
What carries the argument
The Computational Physics Essay capstone project using Python in Jupyter notebooks, evaluated via a customized 20-item rubric based on Weintrop's computational thinking taxonomy.
Load-bearing premise
The customized 20-item rubric based on Weintrop's taxonomy validly and reliably captures computational thinking practices in this engineering context.
What would settle it
Applying an independent validated measure of scientific argumentation or physical sensemaking to the same 100 essays and finding no correlation with the rubric scores would challenge the claim.
read the original abstract
Background: As traditional coding tasks in education become increasingly vulnerable to the use of Generative AI, there is a critical need for authentic, project-based assessments that evaluate students' scientific inquiry. To address this need, we adapted the existing Computational Essay framework to create the Computational Physics Essay (CPE). Administered as a culminating capstone project, the CPE required introductory engineering students to use Python within Jupyter Notebooks to iteratively model real-world physics systems. We analyzed a random sample of CPE submissions (N = 100) using a customized 20-item rubric based on Weintrop's computational thinking (CT) taxonomy. Results: The project-based constraint successfully elicited a high variety of CT practices. Students demonstrated high proficiency in Modeling and Systems Thinking, with 99% successfully investigating complex systems as a whole. Furthermore, the use of CT practices strongly correlated (\r{ho}= 0.75) with expert ratings of the overall quality of the CPE. While some students showed expected novice weaknesses in software modularity, the CPE successfully shifted their epistemic frame toward physical sensemaking. Conclusions: Situating computation within real-world capstone projects provides a robust framework for assessing CT, bridging the gap between programming and scientific argumentation in introductory engineering students.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Computational Physics Essays (CPEs) as capstone projects for introductory engineering students, requiring them to iteratively model real-world physics systems using Python in Jupyter Notebooks. A random sample of N=100 CPE submissions is scored with a customized 20-item rubric derived from Weintrop's computational thinking (CT) taxonomy. Key findings include high variety of elicited CT practices, 99% proficiency in investigating complex systems as a whole (Modeling and Systems Thinking), and a strong correlation (ρ=0.75) between CT practice use and expert ratings of overall CPE quality. The authors conclude that this project-based approach shifts students toward physical sensemaking and provides an authentic assessment framework resistant to generative AI.
Significance. If the rubric-based results hold, the work provides useful evidence that authentic, project-based computational essays can effectively promote and assess CT practices in engineering physics contexts, particularly by emphasizing scientific inquiry over isolated coding tasks. The quantitative link between CT engagement and essay quality, combined with the use of an established external taxonomy applied to student artifacts, adds value to physics education research on computational thinking.
major comments (2)
- [Methods] Methods (rubric description): The customized 20-item rubric based on Weintrop's taxonomy is presented without any reported details on the adaptation process, pilot testing, content validation by domain experts, or inter-rater reliability statistics (e.g., Cohen's κ or agreement percentages). Because the headline results—99% systems-thinking proficiency and ρ=0.75 correlation—are produced directly from scores on this rubric, the lack of these metrics makes the quantitative claims difficult to interpret.
- [Results] Results (sample and scoring): The random sample of N=100 is described without details on selection procedure, stratification, or any statistical controls; combined with the unvalidated rubric, this leaves the reported proficiency percentages and correlation vulnerable to selection or rater bias.
minor comments (2)
- [Abstract] Abstract: The correlation is written as 'ρ= 0.75' but rendered with the LaTeX fragment 'r{ho}'; correct the typesetting.
- [Abstract] Abstract/Introduction: A brief parenthetical reference or citation to the specific Weintrop et al. taxonomy paper would help readers unfamiliar with the framework.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. The comments identify key areas where additional transparency can strengthen the presentation of our methods and results. We have revised the manuscript to address these concerns and provide point-by-point responses below.
read point-by-point responses
-
Referee: [Methods] Methods (rubric description): The customized 20-item rubric based on Weintrop's taxonomy is presented without any reported details on the adaptation process, pilot testing, content validation by domain experts, or inter-rater reliability statistics (e.g., Cohen's κ or agreement percentages). Because the headline results—99% systems-thinking proficiency and ρ=0.75 correlation—are produced directly from scores on this rubric, the lack of these metrics makes the quantitative claims difficult to interpret.
Authors: We agree that the original manuscript would have benefited from more explicit documentation of the rubric development process. In the revised manuscript, we have added a new subsection in Methods that details the adaptation: we selected and customized 20 items from Weintrop's taxonomy that align with the iterative modeling and Jupyter Notebook workflow required by the CPE. The rubric underwent pilot testing on 15 essays to refine wording and applicability. Content validation was conducted by two independent physics education researchers who confirmed alignment with the taxonomy and relevance to engineering contexts. Inter-rater reliability was evaluated on a random subset of 30 essays scored independently by two raters, resulting in Cohen's κ = 0.81 and 86% raw agreement. These additions are now included to support the validity of the reported proficiency rates and correlation. revision: yes
-
Referee: [Results] Results (sample and scoring): The random sample of N=100 is described without details on selection procedure, stratification, or any statistical controls; combined with the unvalidated rubric, this leaves the reported proficiency percentages and correlation vulnerable to selection or rater bias.
Authors: We acknowledge that the original description of the sample was insufficiently detailed. The revised Results section now specifies that the N=100 submissions were selected via simple random sampling without replacement from the full pool of 248 CPEs using a Python random number generator. No stratification was applied, as all submissions came from a single cohort in the same introductory engineering course. The correlation (Spearman's ρ) was computed to account for non-normality. We have also added a Limitations paragraph discussing potential selection and rater biases and outlining future plans for multi-rater scoring and sensitivity analyses. These revisions improve transparency while preserving the core findings. revision: yes
Circularity Check
No circularity in empirical application of external CT taxonomy
full rationale
The paper applies Weintrop's externally cited taxonomy via a customized 20-item rubric to score N=100 student artifacts, then reports observed proficiency rates (e.g., 99% in systems thinking) and a correlation (ρ=0.75) with separate expert quality ratings. These are direct empirical outcomes of the scoring process rather than quantities derived by construction from fitted parameters, self-definitions, or self-citation chains. No load-bearing steps reduce claims to inputs; the central results rest on independent application of the rubric and expert judgments.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Weintrop's computational thinking taxonomy provides a valid and appropriate framework for scoring student work in this introductory engineering physics context
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclearcustomized 20-item rubric based on Weintrop’s computational thinking (CT) taxonomy... mean score of 16.5 out of 20... ρ=0.75 correlation with expert ratings
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe analyzed a random sample of CPE submissions (N = 100) using a customized 20-item rubric
Reference graph
Works this paper leans on
-
[1]
Criteria for accrediting engineering programs, 2019-2020 , year =
work page 2019
-
[2]
The Initial State of Students Taking an Introductory Physics MOOC , author=. 2013 , eprint=
work page 2013
-
[3]
Brennan, K. and Resnick, M. , title =. Proceedings of the 2012 Annual Meeting of the American Educational Research Association (AERA) , pages =. 2012 , url =
work page 2012
- [4]
-
[5]
Colaboratory , year =
-
[6]
Hamamsy, L. and Zapata, M. and Mart. The competent computational thinking test (cctt): A valid, reliable and gender-fair test for longitudinal ct studies in grades 3--6 , journal =. 2025 , pages =
work page 2025
-
[7]
Hoskens, M. and Wilson, M. , title =. Journal of Educational Measurement , volume =. 2001 , doi =
work page 2001
- [8]
-
[9]
The losing battle against plug-and-chug
Kortemeyer, Gerd. The losing battle against plug-and-chug. Phys. Teach. 2016
work page 2016
-
[10]
A validity and reliability study of the computational thinking scales (cts) , journal =
Korkmaz,. A validity and reliability study of the computational thinking scales (cts) , journal =. 2017 , doi =
work page 2017
-
[11]
Krajcik, J. S. and Shin, N. , title =. The Cambridge Handbook of the Learning Sciences , editor =. 2014 , publisher =
work page 2014
-
[12]
Structured Chain-of-Thought Prompting for Code Generation , author=. 2023 , eprint=
work page 2023
-
[13]
McNeill, K. L. and Krajcik, J. S. , title =. 2011 , publisher =
work page 2011
-
[14]
Next Generation Science Standards: For States, By States , year =
-
[15]
A Case Study: Novel Group Interactions through Introductory Computational Physics , author=. 2015 , eprint=
work page 2015
-
[16]
Odden, T. O. B. and Lockwood, E. and Caballero, M. D. , title =. Physical Review Physics Education Research , volume =. 2019 , doi =
work page 2019
-
[17]
Using computational essays to foster disciplinary epistemic agency in undergraduate science
Odden, Tor Ole B and Silvia, Devin W and Malthe-S renssen, Anders. Using computational essays to foster disciplinary epistemic agency in undergraduate science. J. Res. Sci. Teach. 2023 , doi =
work page 2023
-
[18]
Papert, Seymour and Harel, Idit , title =. Constructionism , publisher =. 1991 , address =
work page 1991
-
[19]
Project Jupyter, an open-source project , year =
-
[20]
Rich, P. and Browning, S. F. , title =. Research Anthology on Computational Thinking, Programming, and Robotics in the Classroom , pages =. 2022 , publisher =
work page 2022
-
[21]
Shute, V. J. and Sun, C.and Asbell-Clarke, J. , title =. Educational Research Review , volume =. 2017 , doi =
work page 2017
-
[22]
Mind in Society: The Development of Higher Psychological Processes , author=. 1978 , publisher=
work page 1978
-
[23]
Wang, S. and Chao, J. , title =. International Journal of STEM Education , volume =. 2021 , doi =
work page 2021
-
[24]
Weintrop, D. and Beheshti, E. and Horn, M. and Orton, K. and Jona, K. and Trouille, L. and Wilensky, U. , title =. Journal of Science Education and Technology , volume =. 2016 , doi =
work page 2016
-
[25]
Weintrop, D. and Wilensky, U. and Horn, M. and Rodgers, K. and Orton, A. and Harris, D. and Levy, D. and Lindgren, M. , title =. Proceedings of the National Association for Research in Science Teaching (NARST) , year =
-
[26]
Weller, D. P. and Bott, T. E. and Caballero, M. D. and Irving, P. W. , title =. Physical Review Physics Education Research , volume =. 2022 , doi =
work page 2022
-
[27]
Wenger, E. , title =. 1998 , publisher =. doi:10.1017/CBO9780511803932 , url =
- [28]
-
[29]
Leveraging AI for Rapid Generation of Physics Simulations in Education: Building Your Own Virtual Lab , author=. 2024 , eprint=
work page 2024
-
[30]
Measuring nominal scale agreement among many raters , Author =. 1971 , Journal =. doi:10.1037/h0031619 , Number =
-
[31]
Becker, Brett A. and Denny, Paul and Finnie-Ansley, James and Luxton-Reilly, Andrew and Prather, James and Santos, Eddie Antonio , title =. 2023 , isbn =. doi:10.1145/3545945.3569759 , booktitle =
-
[32]
Prather, James and Denny, Paul and Leinonen, Juho and Becker, Brett A. and Albluwi, Ibrahim and Craig, Michelle and Keuning, Hieke and Kiesler, Natalie and Kohn, Tobias and Luxton-Reilly, Andrew and MacNeil, Stephen and Petersen, Andrew and Pettit, Raymond and Reeves, Brent N. and Savelka, Jaromir , year=. The Robots Are Here: Navigating the Generative AI...
-
[33]
Comparative cognitive task analyses of experimental science and instructional laboratory courses
Wieman, Carl. Comparative cognitive task analyses of experimental science and instructional laboratory courses. Phys. Teach. 2015
work page 2015
-
[34]
Experiential Learning: Experience As The Source Of Learning And Development , volume =
Kolb, David , year =. Experiential Learning: Experience As The Source Of Learning And Development , volume =
-
[35]
Proceedings of the NARST 2009 annual meeting , volume=
Robust assessment instrument for student problem solving , author=. Proceedings of the NARST 2009 annual meeting , volume=. 2009 , organization=
work page 2009
-
[36]
Physical Review Special Topics—Physics Education Research , volume=
Development and uses of upper-division conceptual assessments , author=. Physical Review Special Topics—Physics Education Research , volume=. 2015 , publisher=
work page 2015
-
[37]
Beyond the scientific method: Model-based inquiry as a new paradigm of preference for school science investigations , author=. Science education , volume=. 2008 , publisher=
work page 2008
-
[38]
Developing a learning progression for scientific modeling: Making scientific modeling accessible and meaningful for learners , author=. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching , volume=. 2009 , publisher=
work page 2009
-
[39]
Physical Review Special Topics-Physics Education Research , volume=
Conceptual problem solving in high school physics , author=. Physical Review Special Topics-Physics Education Research , volume=. 2015 , publisher=
work page 2015
-
[40]
Challenging ChatGPT with different types of physics education questions , author=. The Physics Teacher , volume=. 2024 , publisher=
work page 2024
-
[41]
Frontiers in Education , volume=
Examining the potential and pitfalls of ChatGPT in science and engineering problem-solving , author=. Frontiers in Education , volume=. 2024 , organization=
work page 2024
-
[42]
2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=
Towards understanding the characteristics of code generation errors made by large language models , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization=
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.