Relative benefits of different active learning methods to conceptual physics learning
Pith reviewed 2026-05-22 16:51 UTC · model grok-4.3
The pith
Active learning improves conceptual physics understanding across four methods, with SCALE-UP producing larger gains than ISLE and Peer Instruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a study of 31 courses at 28 institutions involving 2,855 students, all four active learning methods produced measurable conceptual learning gains on concept inventories, ranging from 2.09-sigma to 6.22-sigma above a null effect. SCALE-UP produced significantly larger gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference), while Tutorials showed no significant difference from the other three. Peer network development was similar across methods, but classroom videos showed that SCALE-UP and Tutorials devoted most time to student activities such as worksheets and labs, whereas many ISLE and Peer Instruction courses included substantial lecturing.
What carries the argument
Direct comparison of conceptual learning gains from ISLE, Peer Instruction, Tutorials, and SCALE-UP, using pre/post concept inventory scores, peer network surveys, and classroom video recordings to distinguish the effects of activity time allocation from peer interactions.
If this is right
- SCALE-UP and Tutorials produce larger conceptual gains when instructors allocate most class time to student-centered activities.
- Peer Instruction and ISLE may achieve comparable gains if lecturing time is reduced in favor of active tasks.
- Peer network formation occurs at similar rates across methods and does not explain differences in learning outcomes.
- The benefits of active learning appear across a wide range of institutions and student populations.
Where Pith is reading between the lines
- Departments could improve outcomes by auditing the fraction of class time spent on student work rather than selecting a named method by label alone.
- The pattern of activity-driven gains could be tested in other STEM disciplines to check whether the same time-allocation principle applies outside physics.
- Longitudinal follow-up on the same students could reveal whether the larger gains in SCALE-UP translate into better performance in later courses.
Load-bearing premise
The assumption that differences in observed conceptual gains are caused primarily by the active learning method category rather than by instructor experience, student population differences, or variable implementation fidelity.
What would settle it
A controlled trial in which the same instructors, trained to matched fidelity, teach matched student groups using each method with identical lecture time and then compare concept inventory gains.
Figures
read the original abstract
Extensive research has demonstrated that active learning methods are more effective than traditional lecturing at improving student conceptual understanding and reducing failure rates in undergraduate physics courses. Researchers have developed several distinct active learning methods that are now widely implemented in introductory physics; however, the relative benefits of these methods remain unknown. Here we present the first multi-institutional comparison of the impacts of four well-established active learning methods (ISLE, Peer Instruction, Tutorials, and SCALE-UP) on conceptual learning. We also investigate student development of peer networks and the activities that take place during instruction to explain differences in these impacts. Data include student concept inventory scores, peer network surveys, and classroom video recordings from 31 introductory physics and astronomy courses at 28 different institutions in the United States containing a total of 2,855 students. We find measurable increases in student conceptual learning in all four active learning methods (ranging from 2.09-sigma to 6.22-sigma differences from a null effect), and significantly larger conceptual learning gains in SCALE-UP than in both ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference). Conceptual learning gains in Tutorials are not significantly different from those in the other three methods. Despite the hypothesized benefits of student interactions, student development of peer networks is similar across the four methods. Instead, we observe differences in classroom activities; in many of the observed ISLE and Peer Instruction courses, instructors lecture for a large fraction of class time. In Tutorials and SCALE-UP courses, instructors dedicate most in-class time to student-centered activities such as worksheets and laboratory work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports results from a multi-institutional study involving 31 introductory physics and astronomy courses at 28 institutions with 2,855 students. It compares conceptual learning gains, measured via concept inventories, across four active learning methods: ISLE, Peer Instruction, Tutorials, and SCALE-UP. The study also examines peer network development via surveys and classroom activities via video recordings. Key findings include statistically significant conceptual gains in all methods (2.09 to 6.22 sigma from null), with SCALE-UP showing larger gains than ISLE (2.25 sigma) and Peer Instruction (2.54 sigma), while Tutorials are comparable. Differences are linked to classroom activity profiles rather than peer networks, noting more lecturing in some ISLE and PI courses.
Significance. If the observed differences in conceptual gains can be robustly attributed to the active learning methods after accounting for implementation variations, this work would offer important guidance for physics educators selecting among established active learning approaches. The large sample size and multi-institutional nature strengthen the potential impact. The inclusion of video analysis to explain differences is a strength, providing mechanistic insight beyond outcome measures alone. The finding that peer network development is similar across methods also challenges assumptions about interaction mechanisms in active learning.
major comments (2)
- The central claim that SCALE-UP produces significantly larger conceptual learning gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference) is load-bearing for the paper's primary contribution. The abstract states that many observed ISLE and Peer Instruction courses devoted substantial class time to lecturing while Tutorials and SCALE-UP emphasized student-centered activities such as worksheets and labs. Without explicit controls, stratification, or regression including the fraction of class time on active tasks (from the video recordings), the method labels are entangled with implementation fidelity. This requires additional analysis to determine whether the headline differences would persist under matched activity profiles.
- Methods and Results sections: The manuscript provides insufficient detail on per-method sample sizes (courses and students), the exact statistical procedures used to calculate the reported sigma-level differences (including any clustering by institution or multiple-comparison corrections), and controls for confounders such as instructor experience, student population differences, or prior knowledge. These omissions limit evaluation of the robustness of the cross-method comparisons.
minor comments (2)
- Abstract: The total number of courses and students per method should be stated explicitly to contextualize the statistical comparisons and generalizability.
- Throughout the manuscript: Ensure consistent use of terminology for each method and clear operational definitions of 'conceptual learning gains' and 'student-centered activities' when referencing the video data.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below and describe the revisions we will make to improve the clarity, statistical transparency, and robustness of our findings.
read point-by-point responses
-
Referee: The central claim that SCALE-UP produces significantly larger conceptual learning gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference) is load-bearing for the paper's primary contribution. The abstract states that many observed ISLE and Peer Instruction courses devoted substantial class time to lecturing while Tutorials and SCALE-UP emphasized student-centered activities such as worksheets and labs. Without explicit controls, stratification, or regression including the fraction of class time on active tasks (from the video recordings), the method labels are entangled with implementation fidelity. This requires additional analysis to determine whether the headline differences would persist under matched activity profiles.
Authors: We appreciate the referee's emphasis on disentangling method labels from implementation details. The classroom video data already show a clear pattern: courses with higher fractions of lecturing time were predominantly those labeled ISLE or Peer Instruction and exhibited smaller gains, while Tutorials and SCALE-UP courses allocated most time to student-centered activities. To directly test whether the reported differences persist after accounting for activity profiles, we will add a regression analysis in the revised manuscript that includes the measured fraction of class time on active tasks (derived from the video recordings) as a covariate. This will quantify the extent to which activity profiles explain the observed differences in conceptual gains. revision: yes
-
Referee: Methods and Results sections: The manuscript provides insufficient detail on per-method sample sizes (courses and students), the exact statistical procedures used to calculate the reported sigma-level differences (including any clustering by institution or multiple-comparison corrections), and controls for confounders such as instructor experience, student population differences, or prior knowledge. These omissions limit evaluation of the robustness of the cross-method comparisons.
Authors: We agree that greater statistical transparency is required. In the revised manuscript we will add a table reporting the number of courses and students per method. We will expand the Methods section to specify the exact procedures used to compute the sigma-level differences, including any clustering by institution and corrections for multiple comparisons. Regarding confounders, pre-post concept-inventory scores already incorporate baseline knowledge; however, uniform data on instructor experience and detailed student demographics were not collected across all 28 institutions. We will explicitly discuss these limitations and their implications for interpreting cross-method comparisons. revision: partial
Circularity Check
Empirical data comparison exhibits no circularity
full rationale
The paper reports statistical comparisons of conceptual learning gains drawn directly from pre/post concept inventory scores, peer network surveys, and classroom video observations across 31 courses. No equations, fitted parameters, or derivations are presented that reduce claims to inputs by construction. Self-citations, if present, are not load-bearing for the central empirical findings, which remain falsifiable via independent replication of the data collection protocol. The analysis is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Concept inventory pre/post scores validly measure changes in student conceptual understanding.
- domain assumption Classroom video recordings and peer network surveys reliably capture key instructional activities and social dynamics.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We find measurable increases in student conceptual learning in all four active learning methods (ranging from 2.09-sigma to 6.22-sigma differences from a null effect), and significantly larger conceptual learning gains in SCALE-UP than in both ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Instead, we observe differences in classroom activities; in many of the observed ISLE and Peer Instruction courses, instructors lecture for a large fraction of class time. In Tutorials and SCALE-UP courses, instructors dedicate most in-class time to student-centered activities such as worksheets and laboratory work.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Strategies for Collecting Multi-Institutional Data in Discipline-Based Education Research
The authors outline actionable strategies for multi-institutional DBER data collection and demonstrate them with concept inventory, survey, and observation data from 31 instructors at 28 US institutions.
Reference graph
Works this paper leans on
-
[1]
Investigative Science Learning Environments ∗ms5629@drexel.edu †Corresponding author. eb573@drexel.edu (ISLE) [13]: In all course components, or only in laboratories (labs), students engage in scientific pro- cesses in small groups. Students observe a physics experiment, explain their observations, make predic- tions about new experiments, design and cond...
-
[2]
Peer Instruction [14]: During lectures, students work in small groups of nearby peers to answer clicker (or other voting system) questions. Typically, the instruc- tor poses a question, students answer the question indi- vidually, students discuss the question in small groups, students re-answer the question, and then the instructor explains the answer
-
[3]
These worksheets intend to elicit, confront, and resolve com- mon misconceptions
Tutorials for introductory physics [15] and astron- omy [16]: During lecture and/or recitation sections, students complete worksheets in small groups. These worksheets intend to elicit, confront, and resolve com- mon misconceptions
-
[4]
Relative benefits of different active learning methods to conceptual physics learning
Student-Centered Active Learning Environment with Upside-down Pedagogies (SCALE-UP) [17]: Students solve problems and complete laboratory activities in small groups in an integrated learning (or studio-style) environment, often containing large tables that seat nine students and whiteboards along the classroom perime- ter. We also investigate two central ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
The language and thought of the child
Jean Piaget. The language and thought of the child . Harcourt Brace, New York, 1926
work page 1926
- [6]
-
[7]
Richard R. Hake. Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66(1):64–74, 1998
work page 1998
-
[8]
Eddy, Miles McDonough, Michelle K
Scott Freeman, Sarah L. Eddy, Miles McDonough, Michelle K. Smith, Nnadozie Okoroafor, Hannah Jordt, and Mary Pat Wen- deroth. Active learning increases student performance in sci- ence, engineering, and mathematics. Proceedings of the Na- tional Academy of Sciences , 111(23):8410–8415, 2014
work page 2014
-
[9]
Elli J. Theobald, Mariah J. Hill, Elisa Tran, Sweta Agrawal, E. Nicole Arroyo, Shawn Behling, Nyasha Chambwe, Di- anne Laboy Cintr ´on, Jacob D. Cooper, Gideon Dunster, et al. Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math. Proceedings of the National Academy of Sciences ,...
work page 2020
-
[10]
Carl E. Wieman. Large-scale comparison of science teaching methods sends clear message. Proceedings of the National Academy of Sciences, 111(23):8319–8320, 2014
work page 2014
-
[11]
Melissa Dancy, Charles Henderson, Naneh Apkarian, Estrella Johnson, Marilyne Stains, Jeffrey R. Raker, and Alexandra Lau. Physics instructors’ knowledge and use of active learning has increased over the last decade but most still lecture too much. Physical Review Physics Education Research , 20(1):010119, 2024
work page 2024
-
[12]
Keller, Andrew Crouse, and Matthew F
Michael Rogers, Luke D. Keller, Andrew Crouse, and Matthew F. Price. Implementing comprehensive reform of in- troductory physics at a primarily undergraduate institution: A longitudinal case study. Journal of College Science Teaching , 44(3):82–90, 2015
work page 2015
-
[13]
Louis E. Keiner and Teresa E. Burns. Interactive engagement: How much is enough? The Physics Teacher , 48(2):108–111, 2010
work page 2010
-
[14]
Kelley Commeford, Eric Brewe, and Adrienne Traxler. Charac- terizing active learning environments in physics using network analysis and classroom observations. Physical Review Physics Education Research, 17(2):020136, 2021
work page 2021
-
[15]
Char- acterizing active learning environments in physics using latent profile analysis
Kelley Commeford, Eric Brewe, and Adrienne Traxler. Char- acterizing active learning environments in physics using latent profile analysis. Physical Review Physics Education Research , 18(1):010113, 2022
work page 2022
-
[16]
Laura K. Weir, Megan K. Barker, Lisa M. McDonnell, Na- talie G. Schimpf, Tamara M. Rodela, and Patricia M. Schulte. Small changes, big gains: A curriculum-wide study of teaching practices and student learning in undergraduate biology. PLoS One, 14(8):e0220900, 2019
work page 2019
-
[17]
Investigative science learning environment–a science process approach to learning physics
Eugenia Etkina and Alan Van Heuvelen. Investigative science learning environment–a science process approach to learning physics. Research-based Reform of University Physics, 1(1):1– 48, 2007
work page 2007
-
[18]
Peer Instruction: A User’s Manual
Eric Mazur. Peer Instruction: A User’s Manual. Prentice Hall, 1997
work page 1997
- [19]
-
[20]
J. P. Adams, E. E. Prather, and T. F. Slater. Lecture-Tutorials for Introductory Astronomy. Prentice Hall, Upper Saddle River, NJ, 2005
work page 2005
-
[21]
Robert J. Beichner, Jeffery M. Saul, David S. Abbott, Jeanne J. Morse, Duane Deardorff, Rhett J. Allain, Scott W. Bonham, Melissa H. Dancy, and John S. Risley. The student-centered ac- tivities for large enrollment undergraduate programs (SCALE- UP) project. Research-based Reform of University Physics , 1(1):2–39, 2007
work page 2007
-
[22]
Jesper Bruun and Eric Brewe. Talking and learning physics: Predicting future grades from network measures and force con- cept inventory pretest scores. Physical Review Special Topics- Physics Education Research, 9(2):020109, 2013
work page 2013
-
[23]
Eric A. Williams, Justyna P. Zwolak, Remy Dou, and Eric Brewe. Linking engagement and performance: The social net- work analysis perspective. Physical Review Physics Education Research, 15(2):020150, 2019
work page 2019
-
[24]
Meagan Sundstrom, Andy Schang, Ashley B. Heim, and N. G. Holmes. Understanding interaction network formation across instructional contexts in remote physics courses. Physical Re- view Physics Education Research, 18:020141, Dec 2022
work page 2022
-
[25]
Eric Brewe, Laird H. Kramer, and George E. O’Brien. Chang- ing participation through formation of student learning com- munities. In AIP Conference Proceedings, volume 1289, pages 85–88. American Institute of Physics, 2010
work page 2010
-
[26]
Chandra Turpen and Noah D. Finkelstein. Not all interactive engagement is the same: Variations in physics professors’ im- plementation of peer instruction. Physical Review Special Top- ics—Physics Education Research , 5(2):020101, 2009
work page 2009
-
[27]
Anna K. Wood, Ross K. Galloway, Robyn Donnelly, and Judy Hardy. Characterizing interactive engagement activities in a flipped introductory physics class.Physical Review Physics Ed- ucation Research, 12(1):010140, 2016
work page 2016
-
[28]
David Hestenes, Malcolm Wells, and Gregg Swackhamer. Force Concept inventory. The Physics Teacher, 30(3):141–158, 1992
work page 1992
-
[29]
Marilyne Stains, Jordan Harshman, Megan K. Barker, Stephanie V . Chasteen, Renee Cole, Sue Ellen DeChenne- Peters, M. Kevin Eagan Jr, Joan M. Esson, Jennifer K. Knight, Frank A. Laski, et al. Anatomy of STEM teaching in North American universities. Science, 359(6383):1468–1470, 2018
work page 2018
-
[30]
Quantitative methods in psychology: A power primer
Jacob Cohen. Quantitative methods in psychology: A power primer. Psychological Bulletin, 112:1155–1159, 1992
work page 1992
-
[31]
Jessica Gurevitch and Larry V . Hedges. Statistical issues in ecological meta-analyses. Ecology, 80(4):1142–1149, 1999
work page 1999
-
[32]
Effsize-a package for efficient effect size computation
Marco Torchiano. Effsize-a package for efficient effect size computation. Zenodo, 2016
work page 2016
-
[33]
Conducting meta-analyses in R with the metafor package
Wolfgang Viechtbauer. Conducting meta-analyses in R with the metafor package. Journal of Statistical Software , 36:1–48, 2010
work page 2010
-
[34]
See supplemental material at [url will be inserted by publisher] for details about participant recruitment and data analysis meth- ods
-
[35]
Pavel N. Krivitsky and Mark S. Handcock. A separable model for dynamic networks. Journal of the Royal Statistical Society Series B: Statistical Methodology , 76(1):29–46, 2014
work page 2014
-
[36]
7 Goodness of fit of social network models
David R Hunter, Steven M Goodreau, and Mark S Handcock. 7 Goodness of fit of social network models. Journal of the Amer- ican Statistical Association, 103(481):248–258, 2008
work page 2008
-
[37]
Michelle K. Smith, Francis H. M. Jones, Sarah L. Gilbert, and Carl E. Wieman. The classroom observation protocol for un- dergraduate STEM (COPUS): A new instrument to character- ize university STEM classroom practices. CBE—Life Sciences Education, 12(4):618–627, 2013
work page 2013
-
[38]
J. Richard Landis and Gary G. Koch. The measurement of ob- server agreement for categorical data. Biometrics, pages 159– 174, 1977
work page 1977
-
[39]
Katherine E. Masyn. Latent class analysis and finite mixture modeling. In Todd D. Little, editor, The Oxford Handbook of Quantitative Methods, volume 2, pages 551–611. Oxford Uni- versity Press, New York, 2013
work page 2013
-
[40]
Karen Nylund-Gibson and Andrew Young Choi. Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018
work page 2018
-
[41]
Matt C. Howard and Michael E. Hoffman. Variable-centered, person-centered, and person-specific approaches: Where the- ory meets the method. Organizational Research Methods , 21(4):846–876, 2018
work page 2018
-
[42]
John R. Hipp and Daniel J. Bauer. Local solutions in the es- timation of growth mixture models. Psychological Methods , 11(1):36, 2006
work page 2006
-
[43]
Michael N. Hallquist and Joshua F. Wiley. Mplusautomation: An R package for facilitating large-scale latent variable anal- yses in M plus. Structural Equation Modeling: A Multidisci- plinary Journal, 25(4):621–638, 2018
work page 2018
-
[44]
Rudolph, Gina Brissenden, and Wayne M
Edward E Prather, Alexander L. Rudolph, Gina Brissenden, and Wayne M. Schlingman. A national study assessing the teaching and learning of introductory astronomy. Part I. The effect of in- teractive instruction. American Journal of Physics , 77(4):320– 330, 2009
work page 2009
-
[45]
Brock Murdoch and Paul W. Guy. Active learning in small and large classes. Accounting Education, 11(3):271–282, 2002
work page 2002
-
[46]
Jon R. Stoltzfus and Julie Libarkin. Does the room matter? Ac- tive learning in traditional and enhanced lecture spaces. CBE– Life Sciences Education, 15(4):ar68, 2016
work page 2016
-
[47]
Tessa M. Andrews, Michael J. Leonard, Clinton A. Colgrove, and Steven T. Kalinowski. Active learning not associated with student learning in a random sample of college biology courses. CBE—Life Sciences Education , 10(4):394–405, 2011
work page 2011
-
[48]
Eugenia Etkina, Anna Karelina, Maria Ruibal-Villasenor, David Rosengrant, Rebecca Jordan, and Cindy E. Hmelo- Silver. Design and reflection help students develop scientific abilities: Learning in introductory physics laboratories. The Journal of the Learning Sciences , 19(1):54–98, 2010
work page 2010
-
[49]
N. G. Holmes, Jack Olsen, James L. Thomas, and Carl E. Wie- man. Value added or misattributed? A multi-institution study on the educational benefit of labs for reinforcing physics con- tent. Phys. Rev. Phys. Educ. Res., 13:010129, 2017
work page 2017
-
[50]
https://github.com/msundstrom33/ ComparingActiveLearningMethods.git
-
[51]
Laura McCullough and David Meltzer. Differences in male/female response patterns on alternative-format versions of the Force Concept Inventory. In Physics Education Research Conference 2001, PER Conference, Rochester, New York, July 25-26 2001
work page 2001
-
[52]
Experimental validation of the half-length Force Concept Inventory
Jing Han, Kathleen Koenig, Lili Cui, Joseph Fritchman, Dan Li, Wanyi Sun, Zhao Fu, and Lei Bao. Experimental validation of the half-length Force Concept Inventory. Physical Review Physics Education Research, 12(2):020122, 2016
work page 2016
-
[53]
Multiple-choice test of energy and momentum concepts
Chandralekha Singh and David Rosengrant. Multiple-choice test of energy and momentum concepts. American Journal of Physics, 71(6):607–617, June 2003
work page 2003
-
[54]
Designing an Energy Assessment to Evaluate Student Understanding of Energy Topics
Lin Ding. Designing an Energy Assessment to Evaluate Student Understanding of Energy Topics . Ph.D., North Carolina State University, May 2007
work page 2007
-
[55]
David Hestenes and Malcolm Wells. A Mechanics Baseline Test. The Physics Teacher, 30(3):159–166, 1992
work page 1992
-
[56]
Validity and reliability of the Force and Mo- tion Conceptual Evaluation
Susan Ramlo. Validity and reliability of the Force and Mo- tion Conceptual Evaluation. American Journal of Physics , 76(9):882–886, 2008
work page 2008
-
[57]
Erin M. Bardar, Edward E. Prather, Kenneth Brecher, and Tim- othy F. Slater. Development and validation of the light and spectroscopy concept inventory. Astronomy Education Review, 5(2):103–113, 2007
work page 2007
-
[58]
Very High Research Spending and Doctorate Production
Stephanie J. Slater. The development and validation of the Test Of Astronomy STandards (TOAST). Journal of Astronomy & Earth Sciences Education, 1(1):1–22, 2014. End Matter Tables I and II summarize the courses and institutions, re- spectively, from which our data were collected. Information about instructor recruitment can be found in the Supplemental Ma...
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.