Predictive Modeling for High Impact Active Learning Classrooms

Meagan Sundstrom; N.G. Holmes; Olive Ross

arxiv: 2603.14335 · v3 · pith:3Y7YN4NMnew · submitted 2026-03-15 · ⚛️ physics.ed-ph

Predictive Modeling for High Impact Active Learning Classrooms

Olive Ross , Meagan Sundstrom , N.G. Holmes This is my paper

Pith reviewed 2026-05-15 11:01 UTC · model grok-4.3

classification ⚛️ physics.ed-ph

keywords active learningpredictive modelgroup worksheetsclicker questionsstudent questionslearning gainsundergraduate science

0 comments

The pith

A specific combination of group worksheets, clicker questions, and student questions produces exceptional learning gains with effect sizes greater than 2.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Using data from 69 undergraduate science classes across multiple fields and institutions, the authors create a predictive model linking time spent on classroom activities to student conceptual learning gains. They identify a particular mix—10 to 20 percent of class time on group worksheets, 20 to 40 percent on group clicker questions, plus at least two student questions per hour—that yields effect sizes over 2, much larger than typical active learning. Classes lacking group worksheets perform no better than traditional lectures. These findings translate observational patterns into concrete, testable guidance for improving active learning effectiveness in science courses.

Core claim

We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses.

What carries the argument

A predictive model that maps the percentages of time spent on different active learning activities and the frequency of student questions to measured student conceptual learning gains.

If this is right

Allocating 10-20% of class time to group worksheets is associated with substantially higher learning gains.
Combining that with 20-40% group clicker questions and at least two student questions per hour produces effect sizes exceeding 2.
Classes that omit group worksheets achieve only learning gains similar to those in traditional lecture courses.
The model provides specific targets that instructors can use to design more effective active learning sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Controlled experiments could test whether adopting this exact activity balance causes the high gains or if other factors are at play.
This pattern might apply to active learning in non-science disciplines if similar mechanisms hold.
Instructors could monitor and adjust activity times in real time to approach the identified optimal ranges.
The emphasis on student questions suggests that fostering student voice is key to maximizing gains.

Load-bearing premise

That the associations between specific activity combinations and learning gains observed across the 69 classes are due to the activities themselves rather than other differences like instructor skill or student preparation.

What would settle it

A randomized trial assigning classes to the identified activity mix versus other combinations and finding no significant difference in learning gains beyond effect size 2 would falsify the predictive association.

read the original abstract

Over the past several decades, a large body of research has shown that undergraduate science students learn more and more equitably in active learning classrooms; however, the term "active learning" lacks definition and little research has examined which types and combinations of active learning strategies are most effective. In this study, we use a dataset representing over 10,000 students and 24 institutions to create a predictive model that maps classroom time spent on different activities to student conceptual learning. We find that four variables -- classroom time spent on lecture, group worksheets, clicker questions, and student questions -- are sufficient to reliably predict student learning, as measured by concept inventory scores. We identify one type of class that consistently demonstrates exceptional student learning gains (effect sizes greater than 2): those that spend 10-20% of class time on group worksheets, 20-40% of class time on group clicker questions, and average two or more student questions per hour of class time. We also find that classes which do not utilize group worksheets consistently have learning outcomes comparable to fully lecture classes. These results provide testable recommendations for future controlled studies to investigate effective active learning implementation in undergraduate physics courses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fits a model to 69 classes and gives specific activity time ranges tied to effect sizes above 2, but the observational data cannot support the causal claim that those activities produce the gains.

read the letter

The one thing to know is that this paper fits a predictive model to 69 classes and flags one activity mix—10-20% group worksheets, 20-40% group clickers, and at least two student questions per hour—as linked to effect sizes over 2. What is new is the specific ranges rather than just active learning in general. The multi-field, multi-institution sample gives it some breadth, and the note that classes without worksheets look like straight lecture is a useful contrast to prior broad comparisons.

Referee Report

3 major / 1 minor

Summary. The manuscript develops a predictive model from an observational dataset of 69 multi-field, multi-institutional undergraduate science classes that maps time allocations across classroom activities (group worksheets, group clicker questions, student questions per hour) to student conceptual learning gains. It reports identifying one specific activity combination—10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour—that yields exceptional gains (effect sizes >2), while classes without group worksheets show gains comparable to lecture-only courses, and offers these as testable recommendations for future studies.

Significance. If the reported associations prove robust after proper validation and confounder control, the work would supply concrete, actionable guidance for optimizing active-learning time allocations in science classrooms and could stimulate targeted experimental tests of the identified activity mix.

major comments (3)

[Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.
[Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.
[Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.

minor comments (1)

[Abstract] Abstract: the 69-class sample size is stated without a breakdown by discipline or institution, which would help readers assess the scope of generalizability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important issues of transparency, validation, and interpretation in our observational study. We have revised the abstract and expanded relevant sections of the manuscript to address these points directly while preserving the core findings from the 69-class dataset.

read point-by-point responses

Referee: [Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.

Authors: We agree that the abstract omitted key methodological details. The full manuscript specifies a multiple linear regression model fitted by ordinary least squares, with 5-fold cross-validation used to evaluate out-of-sample performance and estimate prediction error. We will revise the abstract to include a concise statement of the model form, fitting procedure, and performance metrics (including cross-validated R-squared) so readers can immediately assess reliability. revision: yes
Referee: [Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.

Authors: This concern is well-founded. The reported activity thresholds (10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour) are indeed derived from the fitted model on the same 69-class sample and represent the profile that maximizes predicted gains within our data. We already frame the results as generating testable hypotheses for future controlled studies rather than as independently validated patterns. We will add explicit language in the abstract and discussion to emphasize the data-driven, exploratory nature of these thresholds and the need for out-of-sample confirmation. revision: yes
Referee: [Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.

Authors: We agree that the observational design does not support causal claims and that the word 'produces' overstates the evidence. We will replace 'produces' with 'is associated with' in the abstract. The manuscript already includes basic controls for course level and broad institutional type; we will expand the methods and limitations sections to describe these controls explicitly and to acknowledge the absence of direct measures or fixed effects for instructor skill and student preparation, which remain potential confounders. This revision will make the correlational character of the findings clear. revision: yes

Circularity Check

1 steps flagged

Fitted model on observational data identifies high-impact activity mix post-hoc

specific steps

fitted input called prediction [Abstract]
"we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour."

The identification of the precise activity combination and its claimed exceptional gains is performed by fitting the model to the full dataset and then highlighting the subset of activity proportions that exhibit effect sizes >2 within that fit; the reported 'predictive' result is therefore a post-hoc description of the fitted parameters rather than an independent out-of-sample prediction.

full rationale

The paper constructs a predictive model by fitting to the same 69-class observational dataset used to identify the specific activity thresholds (10-20% worksheets, 20-40% clickers, >=2 questions/hour) that yield effect sizes >2. This matches the 'fitted input called prediction' pattern at a minor level because the reported exceptional class type is extracted from the fitted associations rather than tested on held-out data or external benchmarks. No self-citation chain, self-definition, or ansatz smuggling reduces the central claim to its inputs by construction; the derivation is a standard regression-style mapping of observed activity proportions to measured gains and remains self-contained against external benchmarks. The causal language ('produces') raises a separate validity concern but does not create circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that observational data from 69 classes can be used to identify causal combinations of classroom activities; the time-percentage ranges are outputs of a fitted predictive model.

free parameters (1)

activity time percentages = 10-20% group worksheets, 20-40% group clicker questions
The 10-20% and 20-40% ranges are identified by the predictive model fitted to the 69-class dataset.

axioms (1)

domain assumption The multi-institutional dataset of 69 classes is representative and free of major selection bias for building a predictive model of learning gains.
The abstract treats the collected classes as sufficient to map activity times to learning outcomes without further qualification.

pith-pipeline@v0.9.0 · 5422 in / 1475 out tokens · 90519 ms · 2026-05-15T11:01:03.879280+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use multiple linear regression to train a model that maps the fraction of two-minute class intervals... to the concept inventory effect size
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Classes that spend 10-20% of class time on group worksheets, 20-40% on group clicker questions...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.