Relative benefits of different active learning methods to conceptual physics learning

Adrienne L. Traxler; Colin Green; Eric Brewe; Justin Gambrell; Meagan Sundstrom

arxiv: 2505.04577 · v2 · pith:QDA42GAHnew · submitted 2025-05-07 · ⚛️ physics.ed-ph

Relative benefits of different active learning methods to conceptual physics learning

Meagan Sundstrom , Justin Gambrell , Colin Green , Adrienne L. Traxler , Eric Brewe This is my paper

Pith reviewed 2026-05-22 16:51 UTC · model grok-4.3

classification ⚛️ physics.ed-ph

keywords active learningconceptual learningphysics educationSCALE-UPPeer InstructionISLETutorialsconcept inventory

0 comments

The pith

Active learning improves conceptual physics understanding across four methods, with SCALE-UP producing larger gains than ISLE and Peer Instruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four established active learning approaches across many institutions to measure their effects on student conceptual understanding in introductory physics and astronomy. It finds clear learning gains in every method, measured against concept inventory scores, with SCALE-UP showing the strongest results. The authors rule out differences in peer network formation as the explanation and instead tie the outcomes to how much class time is spent on student-centered work versus lecturing. These patterns hold across 31 courses and nearly three thousand students.

Core claim

In a study of 31 courses at 28 institutions involving 2,855 students, all four active learning methods produced measurable conceptual learning gains on concept inventories, ranging from 2.09-sigma to 6.22-sigma above a null effect. SCALE-UP produced significantly larger gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference), while Tutorials showed no significant difference from the other three. Peer network development was similar across methods, but classroom videos showed that SCALE-UP and Tutorials devoted most time to student activities such as worksheets and labs, whereas many ISLE and Peer Instruction courses included substantial lecturing.

What carries the argument

Direct comparison of conceptual learning gains from ISLE, Peer Instruction, Tutorials, and SCALE-UP, using pre/post concept inventory scores, peer network surveys, and classroom video recordings to distinguish the effects of activity time allocation from peer interactions.

If this is right

SCALE-UP and Tutorials produce larger conceptual gains when instructors allocate most class time to student-centered activities.
Peer Instruction and ISLE may achieve comparable gains if lecturing time is reduced in favor of active tasks.
Peer network formation occurs at similar rates across methods and does not explain differences in learning outcomes.
The benefits of active learning appear across a wide range of institutions and student populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Departments could improve outcomes by auditing the fraction of class time spent on student work rather than selecting a named method by label alone.
The pattern of activity-driven gains could be tested in other STEM disciplines to check whether the same time-allocation principle applies outside physics.
Longitudinal follow-up on the same students could reveal whether the larger gains in SCALE-UP translate into better performance in later courses.

Load-bearing premise

The assumption that differences in observed conceptual gains are caused primarily by the active learning method category rather than by instructor experience, student population differences, or variable implementation fidelity.

What would settle it

A controlled trial in which the same instructors, trained to matched fidelity, teach matched student groups using each method with identical lecture time and then compare concept inventory gains.

Figures

Figures reproduced from arXiv: 2505.04577 by Adrienne L. Traxler, Colin Green, Eric Brewe, Justin Gambrell, Meagan Sundstrom.

**Figure 3.** Figure 3: FIG. 3: (a) Four-profile solution for the Latent Profile Analysis of classroom observations, including the percent of the 223 total [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

It has been shown that active learning methods are more effective than traditional lecturing at improving student conceptual understanding and reducing failure rates in undergraduate physics courses. Researchers have developed distinct, active learning methods that are now widely implemented in introductory physics. However, the relative benefits of these methods remain unknown. Here we present a multi-institutional comparison of the impacts of four well-established active learning methods -- Peer Instruction, Investigative Science Learning Environment (ISLE), Tutorials and Student-Centered Active Learning Environment with Upside-Down Pedagogies (SCALE-UP) -- on conceptual learning. We find measurable increases in student conceptual learning in all four active learning methods, and significantly larger gains in SCALE-UP than in either Peer Instruction or ISLE. Student development of peer networks is similar across the four methods, but classroom activities differ. In many of the observed Peer Instruction and ISLE courses, instructors lecture for a large fraction of class time. In Tutorials and SCALE-UP courses, instructors dedicate most in-class time to student-centred activities such as worksheets and laboratory work. These results prompt future work to identify causal mechanisms between specific classroom activities and conceptual learning and to examine additional factors related to variation in student learning across different methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This multi-institutional comparison finds larger conceptual gains for SCALE-UP than for ISLE or Peer Instruction, but the video data tie those gains more to time spent on student-centered tasks than to the method labels themselves.

read the letter

The headline result is a large-scale comparison of four active learning methods in intro physics, with SCALE-UP showing bigger concept inventory gains than ISLE (2.25-sigma) and Peer Instruction (2.54-sigma). Data cover 2855 students across 31 courses at 28 institutions, plus peer network surveys and classroom videos. That scale is the main new element compared to earlier single-site or pairwise work. The authors also report that all four methods produced gains above a null effect, ranging from 2.09 to 6.22 sigma, and that peer network development looked similar across them. The video coding adds a concrete layer by documenting what actually happened in class. Many observed ISLE and Peer Instruction sections still spent a large fraction of time on lecturing, while Tutorials and SCALE-UP stayed focused on worksheets and labs. That observation is useful and directly relevant to interpreting the test score differences. The soft spot is the entanglement between method labels and implementation. The abstract itself flags the variable lecturing in some ISLE and Peer Instruction courses, which means the comparisons are not cleanly matched on the fraction of active time. Instructor experience and student population differences are not addressed in the summary either. The sigma-level differences in gains are reported from the inventory data, but attributing them primarily to the four named methods rather than to how faithfully each class ran student-centered activities leaves the central claim on softer ground. This paper is for physics instructors and education researchers who need comparative evidence at scale when picking or studying active learning approaches. Readers who already know the individual methods will get the most from the activity breakdowns and the multi-site numbers. The empirical measurements and attempt to explain differences with video data are solid enough to deserve a serious referee, even if the controls section will need close attention. I would send it for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript reports results from a multi-institutional study involving 31 introductory physics and astronomy courses at 28 institutions with 2,855 students. It compares conceptual learning gains, measured via concept inventories, across four active learning methods: ISLE, Peer Instruction, Tutorials, and SCALE-UP. The study also examines peer network development via surveys and classroom activities via video recordings. Key findings include statistically significant conceptual gains in all methods (2.09 to 6.22 sigma from null), with SCALE-UP showing larger gains than ISLE (2.25 sigma) and Peer Instruction (2.54 sigma), while Tutorials are comparable. Differences are linked to classroom activity profiles rather than peer networks, noting more lecturing in some ISLE and PI courses.

Significance. If the observed differences in conceptual gains can be robustly attributed to the active learning methods after accounting for implementation variations, this work would offer important guidance for physics educators selecting among established active learning approaches. The large sample size and multi-institutional nature strengthen the potential impact. The inclusion of video analysis to explain differences is a strength, providing mechanistic insight beyond outcome measures alone. The finding that peer network development is similar across methods also challenges assumptions about interaction mechanisms in active learning.

major comments (2)

The central claim that SCALE-UP produces significantly larger conceptual learning gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference) is load-bearing for the paper's primary contribution. The abstract states that many observed ISLE and Peer Instruction courses devoted substantial class time to lecturing while Tutorials and SCALE-UP emphasized student-centered activities such as worksheets and labs. Without explicit controls, stratification, or regression including the fraction of class time on active tasks (from the video recordings), the method labels are entangled with implementation fidelity. This requires additional analysis to determine whether the headline differences would persist under matched activity profiles.
Methods and Results sections: The manuscript provides insufficient detail on per-method sample sizes (courses and students), the exact statistical procedures used to calculate the reported sigma-level differences (including any clustering by institution or multiple-comparison corrections), and controls for confounders such as instructor experience, student population differences, or prior knowledge. These omissions limit evaluation of the robustness of the cross-method comparisons.

minor comments (2)

Abstract: The total number of courses and students per method should be stated explicitly to contextualize the statistical comparisons and generalizability.
Throughout the manuscript: Ensure consistent use of terminology for each method and clear operational definitions of 'conceptual learning gains' and 'student-centered activities' when referencing the video data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below and describe the revisions we will make to improve the clarity, statistical transparency, and robustness of our findings.

read point-by-point responses

Referee: The central claim that SCALE-UP produces significantly larger conceptual learning gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference) is load-bearing for the paper's primary contribution. The abstract states that many observed ISLE and Peer Instruction courses devoted substantial class time to lecturing while Tutorials and SCALE-UP emphasized student-centered activities such as worksheets and labs. Without explicit controls, stratification, or regression including the fraction of class time on active tasks (from the video recordings), the method labels are entangled with implementation fidelity. This requires additional analysis to determine whether the headline differences would persist under matched activity profiles.

Authors: We appreciate the referee's emphasis on disentangling method labels from implementation details. The classroom video data already show a clear pattern: courses with higher fractions of lecturing time were predominantly those labeled ISLE or Peer Instruction and exhibited smaller gains, while Tutorials and SCALE-UP courses allocated most time to student-centered activities. To directly test whether the reported differences persist after accounting for activity profiles, we will add a regression analysis in the revised manuscript that includes the measured fraction of class time on active tasks (derived from the video recordings) as a covariate. This will quantify the extent to which activity profiles explain the observed differences in conceptual gains. revision: yes
Referee: Methods and Results sections: The manuscript provides insufficient detail on per-method sample sizes (courses and students), the exact statistical procedures used to calculate the reported sigma-level differences (including any clustering by institution or multiple-comparison corrections), and controls for confounders such as instructor experience, student population differences, or prior knowledge. These omissions limit evaluation of the robustness of the cross-method comparisons.

Authors: We agree that greater statistical transparency is required. In the revised manuscript we will add a table reporting the number of courses and students per method. We will expand the Methods section to specify the exact procedures used to compute the sigma-level differences, including any clustering by institution and corrections for multiple comparisons. Regarding confounders, pre-post concept-inventory scores already incorporate baseline knowledge; however, uniform data on instructor experience and detailed student demographics were not collected across all 28 institutions. We will explicitly discuss these limitations and their implications for interpreting cross-method comparisons. revision: partial

Circularity Check

0 steps flagged

Empirical data comparison exhibits no circularity

full rationale

The paper reports statistical comparisons of conceptual learning gains drawn directly from pre/post concept inventory scores, peer network surveys, and classroom video observations across 31 courses. No equations, fitted parameters, or derivations are presented that reduce claims to inputs by construction. Self-citations, if present, are not load-bearing for the central empirical findings, which remain falsifiable via independent replication of the data collection protocol. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions about measurement validity in physics education research and observational reliability, with no free parameters or invented entities introduced in the reported results.

axioms (2)

domain assumption Concept inventory pre/post scores validly measure changes in student conceptual understanding.
All reported learning gains and statistical comparisons depend on this assumption.
domain assumption Classroom video recordings and peer network surveys reliably capture key instructional activities and social dynamics.
These data are used to explain why gains differ across methods.

pith-pipeline@v0.9.0 · 5833 in / 1393 out tokens · 81109 ms · 2026-05-22T16:51:40.806585+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We find measurable increases in student conceptual learning in all four active learning methods (ranging from 2.09-sigma to 6.22-sigma differences from a null effect), and significantly larger conceptual learning gains in SCALE-UP than in both ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Instead, we observe differences in classroom activities; in many of the observed ISLE and Peer Instruction courses, instructors lecture for a large fraction of class time. In Tutorials and SCALE-UP courses, instructors dedicate most in-class time to student-centered activities such as worksheets and laboratory work.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Strategies for Collecting Multi-Institutional Data in Discipline-Based Education Research
physics.ed-ph 2026-05 unverdicted novelty 4.0

The authors outline actionable strategies for multi-institutional DBER data collection and demonstrate them with concept inventory, survey, and observation data from 31 instructors at 28 US institutions.