pith. sign in

arxiv: 2505.04577 · v1 · pith:QDA42GAHnew · submitted 2025-05-07 · ⚛️ physics.ed-ph

Relative benefits of different active learning methods to conceptual physics learning

Pith reviewed 2026-05-22 16:51 UTC · model grok-4.3

classification ⚛️ physics.ed-ph
keywords active learningconceptual learningphysics educationSCALE-UPPeer InstructionISLETutorialsconcept inventory
0
0 comments X

The pith

Active learning improves conceptual physics understanding across four methods, with SCALE-UP producing larger gains than ISLE and Peer Instruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four established active learning approaches across many institutions to measure their effects on student conceptual understanding in introductory physics and astronomy. It finds clear learning gains in every method, measured against concept inventory scores, with SCALE-UP showing the strongest results. The authors rule out differences in peer network formation as the explanation and instead tie the outcomes to how much class time is spent on student-centered work versus lecturing. These patterns hold across 31 courses and nearly three thousand students.

Core claim

In a study of 31 courses at 28 institutions involving 2,855 students, all four active learning methods produced measurable conceptual learning gains on concept inventories, ranging from 2.09-sigma to 6.22-sigma above a null effect. SCALE-UP produced significantly larger gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference), while Tutorials showed no significant difference from the other three. Peer network development was similar across methods, but classroom videos showed that SCALE-UP and Tutorials devoted most time to student activities such as worksheets and labs, whereas many ISLE and Peer Instruction courses included substantial lecturing.

What carries the argument

Direct comparison of conceptual learning gains from ISLE, Peer Instruction, Tutorials, and SCALE-UP, using pre/post concept inventory scores, peer network surveys, and classroom video recordings to distinguish the effects of activity time allocation from peer interactions.

If this is right

  • SCALE-UP and Tutorials produce larger conceptual gains when instructors allocate most class time to student-centered activities.
  • Peer Instruction and ISLE may achieve comparable gains if lecturing time is reduced in favor of active tasks.
  • Peer network formation occurs at similar rates across methods and does not explain differences in learning outcomes.
  • The benefits of active learning appear across a wide range of institutions and student populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Departments could improve outcomes by auditing the fraction of class time spent on student work rather than selecting a named method by label alone.
  • The pattern of activity-driven gains could be tested in other STEM disciplines to check whether the same time-allocation principle applies outside physics.
  • Longitudinal follow-up on the same students could reveal whether the larger gains in SCALE-UP translate into better performance in later courses.

Load-bearing premise

The assumption that differences in observed conceptual gains are caused primarily by the active learning method category rather than by instructor experience, student population differences, or variable implementation fidelity.

What would settle it

A controlled trial in which the same instructors, trained to matched fidelity, teach matched student groups using each method with identical lecture time and then compare concept inventory gains.

Figures

Figures reproduced from arXiv: 2505.04577 by Adrienne L. Traxle, Colin Green, Eric Brewe, Justin Gambrell, Meagan Sundstrom.

Figure 1
Figure 1. Figure 1: FIG. 1: Effect sizes for concept inventory scores by active [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3: (a) Four-profile solution for the Latent Profile Analysis of classroom observations, including the percent of the 223 total [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Extensive research has demonstrated that active learning methods are more effective than traditional lecturing at improving student conceptual understanding and reducing failure rates in undergraduate physics courses. Researchers have developed several distinct active learning methods that are now widely implemented in introductory physics; however, the relative benefits of these methods remain unknown. Here we present the first multi-institutional comparison of the impacts of four well-established active learning methods (ISLE, Peer Instruction, Tutorials, and SCALE-UP) on conceptual learning. We also investigate student development of peer networks and the activities that take place during instruction to explain differences in these impacts. Data include student concept inventory scores, peer network surveys, and classroom video recordings from 31 introductory physics and astronomy courses at 28 different institutions in the United States containing a total of 2,855 students. We find measurable increases in student conceptual learning in all four active learning methods (ranging from 2.09-sigma to 6.22-sigma differences from a null effect), and significantly larger conceptual learning gains in SCALE-UP than in both ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference). Conceptual learning gains in Tutorials are not significantly different from those in the other three methods. Despite the hypothesized benefits of student interactions, student development of peer networks is similar across the four methods. Instead, we observe differences in classroom activities; in many of the observed ISLE and Peer Instruction courses, instructors lecture for a large fraction of class time. In Tutorials and SCALE-UP courses, instructors dedicate most in-class time to student-centered activities such as worksheets and laboratory work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports results from a multi-institutional study involving 31 introductory physics and astronomy courses at 28 institutions with 2,855 students. It compares conceptual learning gains, measured via concept inventories, across four active learning methods: ISLE, Peer Instruction, Tutorials, and SCALE-UP. The study also examines peer network development via surveys and classroom activities via video recordings. Key findings include statistically significant conceptual gains in all methods (2.09 to 6.22 sigma from null), with SCALE-UP showing larger gains than ISLE (2.25 sigma) and Peer Instruction (2.54 sigma), while Tutorials are comparable. Differences are linked to classroom activity profiles rather than peer networks, noting more lecturing in some ISLE and PI courses.

Significance. If the observed differences in conceptual gains can be robustly attributed to the active learning methods after accounting for implementation variations, this work would offer important guidance for physics educators selecting among established active learning approaches. The large sample size and multi-institutional nature strengthen the potential impact. The inclusion of video analysis to explain differences is a strength, providing mechanistic insight beyond outcome measures alone. The finding that peer network development is similar across methods also challenges assumptions about interaction mechanisms in active learning.

major comments (2)
  1. The central claim that SCALE-UP produces significantly larger conceptual learning gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference) is load-bearing for the paper's primary contribution. The abstract states that many observed ISLE and Peer Instruction courses devoted substantial class time to lecturing while Tutorials and SCALE-UP emphasized student-centered activities such as worksheets and labs. Without explicit controls, stratification, or regression including the fraction of class time on active tasks (from the video recordings), the method labels are entangled with implementation fidelity. This requires additional analysis to determine whether the headline differences would persist under matched activity profiles.
  2. Methods and Results sections: The manuscript provides insufficient detail on per-method sample sizes (courses and students), the exact statistical procedures used to calculate the reported sigma-level differences (including any clustering by institution or multiple-comparison corrections), and controls for confounders such as instructor experience, student population differences, or prior knowledge. These omissions limit evaluation of the robustness of the cross-method comparisons.
minor comments (2)
  1. Abstract: The total number of courses and students per method should be stated explicitly to contextualize the statistical comparisons and generalizability.
  2. Throughout the manuscript: Ensure consistent use of terminology for each method and clear operational definitions of 'conceptual learning gains' and 'student-centered activities' when referencing the video data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below and describe the revisions we will make to improve the clarity, statistical transparency, and robustness of our findings.

read point-by-point responses
  1. Referee: The central claim that SCALE-UP produces significantly larger conceptual learning gains than ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference) is load-bearing for the paper's primary contribution. The abstract states that many observed ISLE and Peer Instruction courses devoted substantial class time to lecturing while Tutorials and SCALE-UP emphasized student-centered activities such as worksheets and labs. Without explicit controls, stratification, or regression including the fraction of class time on active tasks (from the video recordings), the method labels are entangled with implementation fidelity. This requires additional analysis to determine whether the headline differences would persist under matched activity profiles.

    Authors: We appreciate the referee's emphasis on disentangling method labels from implementation details. The classroom video data already show a clear pattern: courses with higher fractions of lecturing time were predominantly those labeled ISLE or Peer Instruction and exhibited smaller gains, while Tutorials and SCALE-UP courses allocated most time to student-centered activities. To directly test whether the reported differences persist after accounting for activity profiles, we will add a regression analysis in the revised manuscript that includes the measured fraction of class time on active tasks (derived from the video recordings) as a covariate. This will quantify the extent to which activity profiles explain the observed differences in conceptual gains. revision: yes

  2. Referee: Methods and Results sections: The manuscript provides insufficient detail on per-method sample sizes (courses and students), the exact statistical procedures used to calculate the reported sigma-level differences (including any clustering by institution or multiple-comparison corrections), and controls for confounders such as instructor experience, student population differences, or prior knowledge. These omissions limit evaluation of the robustness of the cross-method comparisons.

    Authors: We agree that greater statistical transparency is required. In the revised manuscript we will add a table reporting the number of courses and students per method. We will expand the Methods section to specify the exact procedures used to compute the sigma-level differences, including any clustering by institution and corrections for multiple comparisons. Regarding confounders, pre-post concept-inventory scores already incorporate baseline knowledge; however, uniform data on instructor experience and detailed student demographics were not collected across all 28 institutions. We will explicitly discuss these limitations and their implications for interpreting cross-method comparisons. revision: partial

Circularity Check

0 steps flagged

Empirical data comparison exhibits no circularity

full rationale

The paper reports statistical comparisons of conceptual learning gains drawn directly from pre/post concept inventory scores, peer network surveys, and classroom video observations across 31 courses. No equations, fitted parameters, or derivations are presented that reduce claims to inputs by construction. Self-citations, if present, are not load-bearing for the central empirical findings, which remain falsifiable via independent replication of the data collection protocol. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions about measurement validity in physics education research and observational reliability, with no free parameters or invented entities introduced in the reported results.

axioms (2)
  • domain assumption Concept inventory pre/post scores validly measure changes in student conceptual understanding.
    All reported learning gains and statistical comparisons depend on this assumption.
  • domain assumption Classroom video recordings and peer network surveys reliably capture key instructional activities and social dynamics.
    These data are used to explain why gains differ across methods.

pith-pipeline@v0.9.0 · 5833 in / 1393 out tokens · 81109 ms · 2026-05-22T16:51:40.806585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We find measurable increases in student conceptual learning in all four active learning methods (ranging from 2.09-sigma to 6.22-sigma differences from a null effect), and significantly larger conceptual learning gains in SCALE-UP than in both ISLE (2.25-sigma difference) and Peer Instruction (2.54-sigma difference).

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Instead, we observe differences in classroom activities; in many of the observed ISLE and Peer Instruction courses, instructors lecture for a large fraction of class time. In Tutorials and SCALE-UP courses, instructors dedicate most in-class time to student-centered activities such as worksheets and laboratory work.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Strategies for Collecting Multi-Institutional Data in Discipline-Based Education Research

    physics.ed-ph 2026-05 unverdicted novelty 4.0

    The authors outline actionable strategies for multi-institutional DBER data collection and demonstrate them with concept inventory, survey, and observation data from 31 instructors at 28 US institutions.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    eb573@drexel.edu (ISLE) [13]: In all course components, or only in laboratories (labs), students engage in scientific pro- cesses in small groups

    Investigative Science Learning Environments ∗ms5629@drexel.edu †Corresponding author. eb573@drexel.edu (ISLE) [13]: In all course components, or only in laboratories (labs), students engage in scientific pro- cesses in small groups. Students observe a physics experiment, explain their observations, make predic- tions about new experiments, design and cond...

  2. [2]

    Peer Instruction [14]: During lectures, students work in small groups of nearby peers to answer clicker (or other voting system) questions. Typically, the instruc- tor poses a question, students answer the question indi- vidually, students discuss the question in small groups, students re-answer the question, and then the instructor explains the answer

  3. [3]

    These worksheets intend to elicit, confront, and resolve com- mon misconceptions

    Tutorials for introductory physics [15] and astron- omy [16]: During lecture and/or recitation sections, students complete worksheets in small groups. These worksheets intend to elicit, confront, and resolve com- mon misconceptions

  4. [4]

    Relative benefits of different active learning methods to conceptual physics learning

    Student-Centered Active Learning Environment with Upside-down Pedagogies (SCALE-UP) [17]: Students solve problems and complete laboratory activities in small groups in an integrated learning (or studio-style) environment, often containing large tables that seat nine students and whiteboards along the classroom perime- ter. We also investigate two central ...

  5. [5]

    The language and thought of the child

    Jean Piaget. The language and thought of the child . Harcourt Brace, New York, 1926

  6. [6]

    Vygotsky

    Lev S. Vygotsky. Mind in society: The development of higher psychological processes, volume 86. Harvard University Press, 1978

  7. [7]

    Richard R. Hake. Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66(1):64–74, 1998

  8. [8]

    Eddy, Miles McDonough, Michelle K

    Scott Freeman, Sarah L. Eddy, Miles McDonough, Michelle K. Smith, Nnadozie Okoroafor, Hannah Jordt, and Mary Pat Wen- deroth. Active learning increases student performance in sci- ence, engineering, and mathematics. Proceedings of the Na- tional Academy of Sciences , 111(23):8410–8415, 2014

  9. [9]

    Theobald, Mariah J

    Elli J. Theobald, Mariah J. Hill, Elisa Tran, Sweta Agrawal, E. Nicole Arroyo, Shawn Behling, Nyasha Chambwe, Di- anne Laboy Cintr ´on, Jacob D. Cooper, Gideon Dunster, et al. Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math. Proceedings of the National Academy of Sciences ,...

  10. [10]

    Carl E. Wieman. Large-scale comparison of science teaching methods sends clear message. Proceedings of the National Academy of Sciences, 111(23):8319–8320, 2014

  11. [11]

    Raker, and Alexandra Lau

    Melissa Dancy, Charles Henderson, Naneh Apkarian, Estrella Johnson, Marilyne Stains, Jeffrey R. Raker, and Alexandra Lau. Physics instructors’ knowledge and use of active learning has increased over the last decade but most still lecture too much. Physical Review Physics Education Research , 20(1):010119, 2024

  12. [12]

    Keller, Andrew Crouse, and Matthew F

    Michael Rogers, Luke D. Keller, Andrew Crouse, and Matthew F. Price. Implementing comprehensive reform of in- troductory physics at a primarily undergraduate institution: A longitudinal case study. Journal of College Science Teaching , 44(3):82–90, 2015

  13. [13]

    Keiner and Teresa E

    Louis E. Keiner and Teresa E. Burns. Interactive engagement: How much is enough? The Physics Teacher , 48(2):108–111, 2010

  14. [14]

    Charac- terizing active learning environments in physics using network analysis and classroom observations

    Kelley Commeford, Eric Brewe, and Adrienne Traxler. Charac- terizing active learning environments in physics using network analysis and classroom observations. Physical Review Physics Education Research, 17(2):020136, 2021

  15. [15]

    Char- acterizing active learning environments in physics using latent profile analysis

    Kelley Commeford, Eric Brewe, and Adrienne Traxler. Char- acterizing active learning environments in physics using latent profile analysis. Physical Review Physics Education Research , 18(1):010113, 2022

  16. [16]

    Weir, Megan K

    Laura K. Weir, Megan K. Barker, Lisa M. McDonnell, Na- talie G. Schimpf, Tamara M. Rodela, and Patricia M. Schulte. Small changes, big gains: A curriculum-wide study of teaching practices and student learning in undergraduate biology. PLoS One, 14(8):e0220900, 2019

  17. [17]

    Investigative science learning environment–a science process approach to learning physics

    Eugenia Etkina and Alan Van Heuvelen. Investigative science learning environment–a science process approach to learning physics. Research-based Reform of University Physics, 1(1):1– 48, 2007

  18. [18]

    Peer Instruction: A User’s Manual

    Eric Mazur. Peer Instruction: A User’s Manual. Prentice Hall, 1997

  19. [19]

    McDermott

    Lillian C. McDermott. Tutorials in Introductory Physics. Pren- tice Hall, 2002

  20. [20]

    J. P. Adams, E. E. Prather, and T. F. Slater. Lecture-Tutorials for Introductory Astronomy. Prentice Hall, Upper Saddle River, NJ, 2005

  21. [21]

    Beichner, Jeffery M

    Robert J. Beichner, Jeffery M. Saul, David S. Abbott, Jeanne J. Morse, Duane Deardorff, Rhett J. Allain, Scott W. Bonham, Melissa H. Dancy, and John S. Risley. The student-centered ac- tivities for large enrollment undergraduate programs (SCALE- UP) project. Research-based Reform of University Physics , 1(1):2–39, 2007

  22. [22]

    Talking and learning physics: Predicting future grades from network measures and force con- cept inventory pretest scores

    Jesper Bruun and Eric Brewe. Talking and learning physics: Predicting future grades from network measures and force con- cept inventory pretest scores. Physical Review Special Topics- Physics Education Research, 9(2):020109, 2013

  23. [23]

    Williams, Justyna P

    Eric A. Williams, Justyna P. Zwolak, Remy Dou, and Eric Brewe. Linking engagement and performance: The social net- work analysis perspective. Physical Review Physics Education Research, 15(2):020150, 2019

  24. [24]

    Heim, and N

    Meagan Sundstrom, Andy Schang, Ashley B. Heim, and N. G. Holmes. Understanding interaction network formation across instructional contexts in remote physics courses. Physical Re- view Physics Education Research, 18:020141, Dec 2022

  25. [25]

    Kramer, and George E

    Eric Brewe, Laird H. Kramer, and George E. O’Brien. Chang- ing participation through formation of student learning com- munities. In AIP Conference Proceedings, volume 1289, pages 85–88. American Institute of Physics, 2010

  26. [26]

    Finkelstein

    Chandra Turpen and Noah D. Finkelstein. Not all interactive engagement is the same: Variations in physics professors’ im- plementation of peer instruction. Physical Review Special Top- ics—Physics Education Research , 5(2):020101, 2009

  27. [27]

    Wood, Ross K

    Anna K. Wood, Ross K. Galloway, Robyn Donnelly, and Judy Hardy. Characterizing interactive engagement activities in a flipped introductory physics class.Physical Review Physics Ed- ucation Research, 12(1):010140, 2016

  28. [28]

    Force Concept inventory

    David Hestenes, Malcolm Wells, and Gregg Swackhamer. Force Concept inventory. The Physics Teacher, 30(3):141–158, 1992

  29. [29]

    Barker, Stephanie V

    Marilyne Stains, Jordan Harshman, Megan K. Barker, Stephanie V . Chasteen, Renee Cole, Sue Ellen DeChenne- Peters, M. Kevin Eagan Jr, Joan M. Esson, Jennifer K. Knight, Frank A. Laski, et al. Anatomy of STEM teaching in North American universities. Science, 359(6383):1468–1470, 2018

  30. [30]

    Quantitative methods in psychology: A power primer

    Jacob Cohen. Quantitative methods in psychology: A power primer. Psychological Bulletin, 112:1155–1159, 1992

  31. [31]

    Jessica Gurevitch and Larry V . Hedges. Statistical issues in ecological meta-analyses. Ecology, 80(4):1142–1149, 1999

  32. [32]

    Effsize-a package for efficient effect size computation

    Marco Torchiano. Effsize-a package for efficient effect size computation. Zenodo, 2016

  33. [33]

    Conducting meta-analyses in R with the metafor package

    Wolfgang Viechtbauer. Conducting meta-analyses in R with the metafor package. Journal of Statistical Software , 36:1–48, 2010

  34. [34]

    See supplemental material at [url will be inserted by publisher] for details about participant recruitment and data analysis meth- ods

  35. [35]

    Krivitsky and Mark S

    Pavel N. Krivitsky and Mark S. Handcock. A separable model for dynamic networks. Journal of the Royal Statistical Society Series B: Statistical Methodology , 76(1):29–46, 2014

  36. [36]

    7 Goodness of fit of social network models

    David R Hunter, Steven M Goodreau, and Mark S Handcock. 7 Goodness of fit of social network models. Journal of the Amer- ican Statistical Association, 103(481):248–258, 2008

  37. [37]

    Smith, Francis H

    Michelle K. Smith, Francis H. M. Jones, Sarah L. Gilbert, and Carl E. Wieman. The classroom observation protocol for un- dergraduate STEM (COPUS): A new instrument to character- ize university STEM classroom practices. CBE—Life Sciences Education, 12(4):618–627, 2013

  38. [38]

    Richard Landis and Gary G

    J. Richard Landis and Gary G. Koch. The measurement of ob- server agreement for categorical data. Biometrics, pages 159– 174, 1977

  39. [39]

    Katherine E. Masyn. Latent class analysis and finite mixture modeling. In Todd D. Little, editor, The Oxford Handbook of Quantitative Methods, volume 2, pages 551–611. Oxford Uni- versity Press, New York, 2013

  40. [40]

    Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018

    Karen Nylund-Gibson and Andrew Young Choi. Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018

  41. [41]

    Howard and Michael E

    Matt C. Howard and Michael E. Hoffman. Variable-centered, person-centered, and person-specific approaches: Where the- ory meets the method. Organizational Research Methods , 21(4):846–876, 2018

  42. [42]

    Hipp and Daniel J

    John R. Hipp and Daniel J. Bauer. Local solutions in the es- timation of growth mixture models. Psychological Methods , 11(1):36, 2006

  43. [43]

    Hallquist and Joshua F

    Michael N. Hallquist and Joshua F. Wiley. Mplusautomation: An R package for facilitating large-scale latent variable anal- yses in M plus. Structural Equation Modeling: A Multidisci- plinary Journal, 25(4):621–638, 2018

  44. [44]

    Rudolph, Gina Brissenden, and Wayne M

    Edward E Prather, Alexander L. Rudolph, Gina Brissenden, and Wayne M. Schlingman. A national study assessing the teaching and learning of introductory astronomy. Part I. The effect of in- teractive instruction. American Journal of Physics , 77(4):320– 330, 2009

  45. [45]

    Brock Murdoch and Paul W. Guy. Active learning in small and large classes. Accounting Education, 11(3):271–282, 2002

  46. [46]

    Stoltzfus and Julie Libarkin

    Jon R. Stoltzfus and Julie Libarkin. Does the room matter? Ac- tive learning in traditional and enhanced lecture spaces. CBE– Life Sciences Education, 15(4):ar68, 2016

  47. [47]

    Andrews, Michael J

    Tessa M. Andrews, Michael J. Leonard, Clinton A. Colgrove, and Steven T. Kalinowski. Active learning not associated with student learning in a random sample of college biology courses. CBE—Life Sciences Education , 10(4):394–405, 2011

  48. [48]

    Hmelo- Silver

    Eugenia Etkina, Anna Karelina, Maria Ruibal-Villasenor, David Rosengrant, Rebecca Jordan, and Cindy E. Hmelo- Silver. Design and reflection help students develop scientific abilities: Learning in introductory physics laboratories. The Journal of the Learning Sciences , 19(1):54–98, 2010

  49. [49]

    N. G. Holmes, Jack Olsen, James L. Thomas, and Carl E. Wie- man. Value added or misattributed? A multi-institution study on the educational benefit of labs for reinforcing physics con- tent. Phys. Rev. Phys. Educ. Res., 13:010129, 2017

  50. [50]

    https://github.com/msundstrom33/ ComparingActiveLearningMethods.git

  51. [51]

    Differences in male/female response patterns on alternative-format versions of the Force Concept Inventory

    Laura McCullough and David Meltzer. Differences in male/female response patterns on alternative-format versions of the Force Concept Inventory. In Physics Education Research Conference 2001, PER Conference, Rochester, New York, July 25-26 2001

  52. [52]

    Experimental validation of the half-length Force Concept Inventory

    Jing Han, Kathleen Koenig, Lili Cui, Joseph Fritchman, Dan Li, Wanyi Sun, Zhao Fu, and Lei Bao. Experimental validation of the half-length Force Concept Inventory. Physical Review Physics Education Research, 12(2):020122, 2016

  53. [53]

    Multiple-choice test of energy and momentum concepts

    Chandralekha Singh and David Rosengrant. Multiple-choice test of energy and momentum concepts. American Journal of Physics, 71(6):607–617, June 2003

  54. [54]

    Designing an Energy Assessment to Evaluate Student Understanding of Energy Topics

    Lin Ding. Designing an Energy Assessment to Evaluate Student Understanding of Energy Topics . Ph.D., North Carolina State University, May 2007

  55. [55]

    A Mechanics Baseline Test

    David Hestenes and Malcolm Wells. A Mechanics Baseline Test. The Physics Teacher, 30(3):159–166, 1992

  56. [56]

    Validity and reliability of the Force and Mo- tion Conceptual Evaluation

    Susan Ramlo. Validity and reliability of the Force and Mo- tion Conceptual Evaluation. American Journal of Physics , 76(9):882–886, 2008

  57. [57]

    Bardar, Edward E

    Erin M. Bardar, Edward E. Prather, Kenneth Brecher, and Tim- othy F. Slater. Development and validation of the light and spectroscopy concept inventory. Astronomy Education Review, 5(2):103–113, 2007

  58. [58]

    Very High Research Spending and Doctorate Production

    Stephanie J. Slater. The development and validation of the Test Of Astronomy STandards (TOAST). Journal of Astronomy & Earth Sciences Education, 1(1):1–22, 2014. End Matter Tables I and II summarize the courses and institutions, re- spectively, from which our data were collected. Information about instructor recruitment can be found in the Supplemental Ma...