pith. sign in

arxiv: 2312.06871 · v1 · submitted 2023-12-11 · 💻 cs.AI · cs.LG· cs.MA

Using Analytics on Student Created Data to Content Validate Pedagogical Tools

Pith reviewed 2026-05-24 05:17 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MA
keywords content validitypedagogical toolstime series classificationhierarchical clusteringcurve fittingecological modelingsimulation outcomesstudent data analytics
0
0 comments X

The pith

Agreement between hierarchical clustering and curve fitting reaches 89.38 percent on student-generated ecological time series, supporting a methodology for content validity of the VERA tool.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether student-created simulation outputs in an ecology modeling tool can be grouped into standard domain patterns using two separate analysis techniques. Hierarchical clustering groups the time series by similarity while curve fitting matches them to predefined ecological shapes. The two techniques reach the same classification on 89.38 percent of 971 series drawn from 263 models. Because the methods are independent, their overlap is presented as evidence that the tool produces educationally meaningful content. The approach is offered as a reusable way to check pedagogical simulation software against known domain structures.

Core claim

The authors classify 971 time series from 263 VERA models into common ecological patterns by applying hierarchical clustering and curve fitting independently; the two methods agree on 89.38 percent of the test-set curves, which the paper treats as confirmation that the methodology successfully establishes content validity for the pedagogical tool.

What carries the argument

Comparison of hierarchical clustering labels against curve-fitting labels on the same student-generated population time series, used as a proxy measure for whether the VERA outputs align with established ecological patterns.

If this is right

  • The same dual-classification procedure can be repeated on data from other simulation-based pedagogical tools to assess their content validity.
  • High agreement rates supply a quantitative benchmark that can be tracked when the VERA tool or its curriculum is revised.
  • The method works across three distinct user groups, suggesting it is robust to differences in learner background.
  • Time-series classification can serve as an automated check that student models stay within recognizable ecological behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agreement metric could be turned into real-time feedback that tells a student when their model trajectory falls outside common patterns.
  • If applied to other domains, the approach would require domain-specific pattern libraries before the agreement test can be run.
  • Disagreement cases between the two methods could be examined to reveal either tool limitations or gaps in the pattern library.

Load-bearing premise

High numerical agreement between two automated classification procedures on the same data set is sufficient to show that those classifications capture genuine ecological content rather than shared artifacts of the data or the methods.

What would settle it

An independent expert panel classifying the same 971 time series and finding agreement with the automated labels below 70 percent on the same test set would falsify the claim that the agreement demonstrates content validity.

Figures

Figures reproduced from arXiv: 2312.06871 by Ashok Goel, John Kos, Kenneth Eaton, Rahul Dass, Sareen Zhang, Stephen Buckley, Sungeun An.

Figure 1
Figure 1. Figure 1: An example of a predator-prey conceptual model in VERA. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of the simulation output from a conceptual model in VERA [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Curve types as labeled (1) dying, (2) capped growth, (3) exponential, (4) oscillation, (5) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hierarchical Clustering Process If we introduce new curves and run KNN to determine their similarity to existing clusters, then the introduced curve should fall into a similar cluster to that of their curve type. Second, by evaluating an introduced curve by KNN of the medoid curve, a representative curve which has the least distance between itself and the other curves, in each cluster, we can speed up the … view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of Curve Fitting Due to the variability in scale of the population graphs and the multidimensionality of the parameter space, four methods were taken to ensure that the function was successful in finding matching curves. First, the population graphs were normalized to the maximum value of the graph. This is valuable because the curve fit function searches in depth for any parameter values bet… view at source ↗
Figure 6
Figure 6. Figure 6: Dendrogram for all datasets. As numbered (1) superset, (2) GATECH, (3) NGTC, (4) SDLs. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A comparison between 7 and 8 clusters User Set GATECH NGTC SDLs ALL Method Accuracy (%) 88.43 71.65 70.65 89.38 [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Conceptual and simulation models can function as useful pedagogical tools, however it is important to categorize different outcomes when evaluating them in order to more meaningfully interpret results. VERA is a ecology-based conceptual modeling software that enables users to simulate interactions between biotics and abiotics in an ecosystem, allowing users to form and then verify hypothesis through observing a time series of the species populations. In this paper, we classify this time series into common patterns found in the domain of ecological modeling through two methods, hierarchical clustering and curve fitting, illustrating a general methodology for showing content validity when combining different pedagogical tools. When applied to a diverse sample of 263 models containing 971 time series collected from three different VERA user categories: a Georgia Tech (GATECH), North Georgia Technical College (NGTC), and ``Self Directed Learners'', results showed agreement between both classification methods on 89.38\% of the sample curves in the test set. This serves as a good indication that our methodology for determining content validity was successful.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a methodology for content-validating the VERA ecology modeling tool by classifying student-generated population time series into ecological patterns using two independent procedures—hierarchical clustering and curve fitting—and reports 89.38% agreement between them on a test set of 971 curves drawn from 263 models produced by GATECH, NGTC, and self-directed users. The authors interpret this agreement as evidence that the methodology successfully establishes content validity.

Significance. If the inference from internal agreement to content validity were justified, the work would supply a reproducible, data-driven template for validating simulation-based pedagogical tools without requiring per-student expert grading. The approach is attractive because it operates directly on learner artifacts and could generalize to other modeling environments; however, the current manuscript supplies no external anchor to domain-standard patterns, so the claimed significance does not materialize.

major comments (3)
  1. [Abstract] Abstract (results paragraph): the claim that 89.38% agreement between hierarchical clustering and curve fitting 'serves as a good indication that our methodology for determining content validity was successful' is unsupported. Agreement between two post-hoc classifiers applied to the identical student-generated series demonstrates only pipeline consistency; it supplies no mapping of discovered clusters or fitted templates to textbook ecological archetypes (logistic growth, Lotka-Volterra oscillations, etc.) or to any expert-labeled ground truth.
  2. [Abstract] Abstract (methods description): neither the ecological patterns used as targets for curve fitting nor the procedure for selecting or validating the curve-fitting templates are stated. Without this information it is impossible to determine whether the templates were derived independently of the 971-curve test set or whether they correspond to domain-standard functional forms.
  3. [Abstract] Abstract (results paragraph): no per-category agreement rates, confusion matrix, or error analysis is provided. A single aggregate figure of 89.38% on an unspecified test-set partition cannot establish that the classification pipeline reliably recovers the intended conceptual content of VERA.
minor comments (2)
  1. [Abstract] The abstract should explicitly name the ecological patterns (e.g., exponential, logistic, oscillatory) that the two methods are intended to recover.
  2. [Abstract] The three user cohorts (GATECH, NGTC, Self Directed Learners) are mentioned but not characterized with respect to prior ecology knowledge or task instructions; a brief description would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive critique of our manuscript. We address each major comment below, indicating revisions where the concerns are valid and providing clarification on points of disagreement.

read point-by-point responses
  1. Referee: [Abstract] Abstract (results paragraph): the claim that 89.38% agreement between hierarchical clustering and curve fitting 'serves as a good indication that our methodology for determining content validity was successful' is unsupported. Agreement between two post-hoc classifiers applied to the identical student-generated series demonstrates only pipeline consistency; it supplies no mapping of discovered clusters or fitted templates to textbook ecological archetypes (logistic growth, Lotka-Volterra oscillations, etc.) or to any expert-labeled ground truth.

    Authors: We agree the abstract phrasing is too strong. The reported agreement measures consistency between two independent classification procedures (data-driven clustering and template-based curve fitting to standard ecological forms), which is a prerequisite for but not equivalent to full content validity. We will revise the abstract to state that the agreement supports the reliability of the dual-method pipeline as an initial step in the proposed validation approach, while noting that direct mapping to expert-labeled archetypes is planned future work. revision: yes

  2. Referee: [Abstract] Abstract (methods description): neither the ecological patterns used as targets for curve fitting nor the procedure for selecting or validating the curve-fitting templates are stated. Without this information it is impossible to determine whether the templates were derived independently of the 971-curve test set or whether they correspond to domain-standard functional forms.

    Authors: The full paper specifies the target patterns (exponential growth, logistic growth, damped oscillations, and Lotka-Volterra-style predator-prey cycles) drawn from standard ecology textbooks and selected a priori from domain literature before analyzing the 971 series. Templates were not fitted to or derived from the test data. Because the abstract is space-constrained, we will add a brief clause listing the patterns and confirming their independence from the test set. revision: partial

  3. Referee: [Abstract] Abstract (results paragraph): no per-category agreement rates, confusion matrix, or error analysis is provided. A single aggregate figure of 89.38% on an unspecified test-set partition cannot establish that the classification pipeline reliably recovers the intended conceptual content of VERA.

    Authors: We accept that aggregate agreement alone is insufficient. The manuscript already stratifies results by the three user groups (GATECH, NGTC, self-directed), but we will add per-category agreement percentages and a high-level error summary (e.g., most common mismatch types) to the results section. The abstract will be updated to note that agreement was consistent across user categories. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical agreement reported directly without reduction to inputs by construction

full rationale

The paper reports 89.38% agreement between hierarchical clustering and curve fitting applied to the same 971 student-generated time series and presents this agreement as indicating success of the content-validity methodology. This is an interpretive claim about what the observed consistency implies, not a derivation in which a result is defined in terms of itself, a fitted parameter is relabeled as a prediction, or a load-bearing premise reduces to a self-citation. No equations, fitted parameters, or uniqueness theorems appear in the supplied text that would create a self-referential loop. The central step is therefore self-contained as a straightforward empirical measurement, even if the substantive leap from consistency to domain validity remains open to external critique.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the claim rests on the unstated premise that method agreement equals pedagogical content validity without external ecological benchmarks or expert labels.

axioms (1)
  • domain assumption Agreement between hierarchical clustering and curve fitting on student time series demonstrates content validity of the modeling tool
    This premise is invoked to interpret the 89.38% agreement as success.

pith-pipeline@v0.9.0 · 5721 in / 1097 out tokens · 20944 ms · 2026-05-24T05:17:50.260617+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    [Online; accessed 2023-07-16]

    Modeling Ecosystem Dynamics , jun 8 2022. [Online; accessed 2023-07-16]

  2. [2]

    Vera: popularizing science through ai

    Sungeun An, Robert Bates, Jennifer Hammock, Spencer Rugaber, and Ashok Goel. Vera: popularizing science through ai. In Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, June 27--30, 2018, Proceedings, Part II 19 , pages 31--35. Springer, 2018

  3. [3]

    Ecological systems as complex systems: Challenges for an emerging science

    M Anand, A Gonzale, F Guichard, J Kolasa, and L Parrott. Ecological systems as complex systems: Challenges for an emerging science. Diversity , 2:395--410, 2010

  4. [4]

    Visualization of time-oriented data , volume 4

    Wolfgang Aigner, Silvia Miksch, Heidrun Schumann, and Christian Tominski. Visualization of time-oriented data , volume 4. Springer, 2011

  5. [5]

    Bravi A, Longtin A

    Seely AJ. Bravi A, Longtin A. Review and classification of variability analysis techniques with clinical applications. Biomedical engineering online , pages 1--27, 2011

  6. [6]

    Bogomolova, A

    K. Bogomolova, A. H. Sam, A. T. Misky, C. M. Gupte, P. H. Strutton, T. J. Hurkxkens, and B. P. Hierck. Development of a virtual three‐dimensional assessment scenario for anatomical education. Anatomical sciences education , 14, 2021

  7. [7]

    An improved logistic method for detecting spring vegetation phenology in grasslands from modis evi time-series data

    Ruyin Cao, Jin Chen, Miaogen Shen, and Yanhong Tang. An improved logistic method for detecting spring vegetation phenology in grasslands from modis evi time-series data. Agricultural and Forest Meteorology , 200:9--20, 2015

  8. [8]

    D. A. Cook and R. Hatala. Validation of educational assessments: a primer for simulation and beyond. Advances in simulation , 1:1--12, 2016

  9. [9]

    D. A. Cook, B. Zendejas, S. J. Hamstra, R. Hatala, and R. Brydges. What counts as validity evidence? examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education , 19, 2014

  10. [10]

    Gamification: An innovative teaching-learning strategy for the digital nursing students in a community health nursing course

    Crystal Day-Black. Gamification: An innovative teaching-learning strategy for the digital nursing students in a community health nursing course. ABNF Journal , 26, 2015

  11. [11]

    De la Torre, B

    R. De la Torre, B. S. Onggo, C. G. Corlu, Nogal M., and A. A. Juan. The role of simulation and serious games in teaching concepts on circular economy and sustainable energy. Energies , 14:1138, 2021

  12. [12]

    Predictive ecology: systems approaches, 2012

    Matthew R Evans, Ken J Norris, and Tim G Benton. Predictive ecology: systems approaches, 2012

  13. [13]

    S. Gamito. Growth models and their use in ecological modelling: an application to a fish population. Ecological modelling , 113:83--94, 1998

  14. [14]

    Impact of a creativity support tool on student learning about scientific discovery processes

    A Goel and D Joyner. Impact of a creativity support tool on student learning about scientific discovery processes. In Proceedings of the Sixth International Conference on Computational Creativity , 2015

  15. [15]

    N. J. Gotelli. A primer of Ecology . Sinauer Associates Incorporate, 1995

  16. [16]

    A. K. Goel, S. Rugaber, and S. Vattam. Structure, behavior, and function of complex systems: The structure, behavior, and function modeling language. Ai Edam , 23, 2009

  17. [17]

    A general framework for agent-based modelling of complex systems

    M Holcombe, S Coakley, and Smallwood R. A general framework for agent-based modelling of complex systems. Proceedings of the 2006 European conference on complex systems , 1, 2006

  18. [18]

    Coupled oscillations in food webs: Balancing competition and mutualism in simple ecological models

    Vandermeer J. Coupled oscillations in food webs: Balancing competition and mutualism in simple ecological models. The American Naturalist , 163:857--867, 2004

  19. [19]

    Seasonality extraction by function fitting to time-series of satellite sensor data

    Per Jonsson and Lars Eklundh. Seasonality extraction by function fitting to time-series of satellite sensor data. IEEE transactions on Geoscience and Remote Sensing , 40(8):1824--1832, 2002

  20. [20]

    Mila--s: generation of agent-based simulations from conceptual models of complex systems

    David A Joyner, Ashok K Goel, and Nicolas M Papin. Mila--s: generation of agent-based simulations from conceptual models of complex systems. In Proceedings of the 19th international conference on intelligent user interfaces , pages 289--298, 2014

  21. [21]

    Hugh, and Gene B

    Gauch Jr, G. Hugh, and Gene B. Chase. Fitting the gaussian curve to ecological data. Ecology , 55, 1974

  22. [22]

    & Waring T. M. Janssen M. A., Lee A. Experimental platforms for behavioral experiments on social-ecological systems. Ecology and Society , 19, 2014

  23. [23]

    Jaxa-Rozen, J

    M. Jaxa-Rozen, J. H. Kwakkel, and M. Bloemendal. A coupled simulation architecture for agent-based/geohydrological modelling with netlogo and modflow. Environmental modelling & software , 115, 2019

  24. [24]

    Kassambara

    A. Kassambara. Practical guide to cluster analysis in R: Unsupervised machine learning . 2017

  25. [25]

    Time-series data clustering., 2013

    Dimitrios Kotsakos, Goce Trajcevski, Dimitrios Gunopulos, and Charu C Aggarwal. Time-series data clustering., 2013

  26. [26]

    S. C. Kong and Y. Q. Wang. Item response analysis of computational thinking practices: Test characteristics and students’ learning abilities in visual programming contexts. Computers in Human Behavior , 122, 2021

  27. [27]

    D. P. Demaster L. L. Eberhardt, J. M. Breiwick. Analyzing population growth curves. Oikos , 117:1240--1246, 2008

  28. [28]

    Hierarchical clustering of time series data with parametric derivative dynamic time warping

    Maciej uczak. Hierarchical clustering of time series data with parametric derivative dynamic time warping. Expert Systems with Applications , 62:116--130, 2016

  29. [29]

    Lhermitte, J

    S. Lhermitte, J. Verbesselt, W.W. Verstraeten, and P. Coppin. A comparison of time series similarity measures for classification and change detection of ecosystem dynamics. Remote sensing of environment , 115:3129--3152, 2011

  30. [30]

    Murtagh and P

    F. Murtagh and P. Contreras. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 2, 2017

  31. [31]

    S. Messick. Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist , 50, 1995

  32. [32]

    J. L. McGrath, J. M. Taekman, P. Dev, D. R. Danforth, D. Mohan, N Kman, and K. Won. Using virtual reality simulation environments to assess competence for emergency medicine learners. Academic Emergency Medicine , 25, 2018

  33. [33]

    An overview of clustering methods

    Mahamed GH Omran, Andries P Engelbrecht, and Ayed Salman. An overview of clustering methods. Intelligent Data Analysis , 11(6):583--605, 2007

  34. [34]

    Ormerod and B

    P. Ormerod and B. Rosewell. Validation and verification of agent-based models in the social sciences. International workshop on epistemological aspects of computer simulation in the social sciences , 2006

  35. [35]

    Virtual ecological research assistant (vera), 2023

    Georgia Institute of Technology Design and Intelligence Lab. Virtual ecological research assistant (vera), 2023

  36. [36]

    Julio M. Ottino. Complex systems. American Institute of Chemical Engineers , 49, 2003

  37. [37]

    The usefulness of ecological models: a stock-taking

    E.C Pielou. The usefulness of ecological models: a stock-taking. The Quarterly Review of Biology , 56:17--31, 1981

  38. [38]

    The encyclopedia of life v2: providing global access to knowledge about life on earth

    Cynthia S Parr, Mr Nathan Wilson, Mr Patrick Leary, Katja S Schulz, Ms Kristen Lans, Ms Lisa Walley, Jennifer A Hammock, Mr Anthony Goddard, Mr Jeremy Rice, Mr Marie Studer, et al. The encyclopedia of life v2: providing global access to knowledge about life on earth. Biodiversity data journal , (2), 2014

  39. [39]

    An, sung disseration

    An S. An, sung disseration. Sung dissertation

  40. [40]

    An introduction to population growth

    SB Snider and JN Brimlow. An introduction to population growth. Nature Education Knowledge , 4(4):3, 2013

  41. [41]

    J. C. Thiele, W. Kurth, and V. Grimm. Rnetlogo: An r package for running and exploring individual‐based models implemented in netlogo. Methods in Ecology and Evolution , 3, 2012

  42. [42]

    Tisue and U Wilensky

    S. Tisue and U Wilensky. Netlogo: A simple environment for modeling complexity. International conference on complex systems , 21:16--21, 2004

  43. [43]

    Netlogo: A simple environment for modeling complexity

    Seth Tisue and Uri Wilensky. Netlogo: A simple environment for modeling complexity. In International conference on complex systems , volume 21, pages 16--21. Citeseer, 2004

  44. [44]

    Winterhalder B

    Lu F. Winterhalder B. A forager‐resource population ecology model and implications for indigenous conservation. Conservation Biology , 11:1354--1364, 1997

  45. [45]

    Causal model progressions as a foundation for intelligent learning environments

    Barbara Y White and John R Frederiksen. Causal model progressions as a foundation for intelligent learning environments. Artificial intelligence , 42(1):99--157, 1990

  46. [46]

    Windrum, G

    P. Windrum, G. Fagiolo, and A. Moneta. Empirical validation of agent-based models: Alternatives and prospects. Journal of Artificial Societies and Social Simulation , 10, 2007

  47. [47]

    A review of vegetation phenological metrics extraction using time-series, multispectral satellite data

    Linglin Zeng, Brian D Wardlow, Daxiang Xiang, Shun Hu, and Deren Li. A review of vegetation phenological metrics extraction using time-series, multispectral satellite data. Remote Sensing of Environment , 237:111511, 2020