pith. machine review for the scientific record. sign in

arxiv: 2605.05056 · v1 · submitted 2026-05-06 · 💰 econ.EM

Recognition: unknown

MSE-Optimal Difference-in-Differences Estimator

Yamato Igarashi

Pith reviewed 2026-05-08 15:56 UTC · model grok-4.3

classification 💰 econ.EM
keywords difference-in-differencesmean squared errorpre-trendsbias-variance tradeoffevent studytwo-way fixed effectsoptimal estimator
0
0 comments X

The pith

A difference-in-differences estimator chooses the pre-trend length that minimizes mean squared error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard DiD regressions can produce high-variance estimates in small samples or invalid results when parallel trends fail. Pre-tests for trend violations often lack power. The paper instead selects the number of pre-periods by minimizing the estimator's mean squared error, which directly balances the bias from longer windows against the variance from shorter ones. A reader would care because the resulting estimator aims for lower overall error without separate pre-testing steps. Simulations and one real-data example illustrate how the approach works in practice.

Core claim

The paper develops a difference-in-differences estimation method that selects the optimal length of pre-trends by minimizing the mean squared error (MSE). By focusing on the bias and variance tradeoff, the proposed method derives the MSE-optimal estimator from the optimal length of pre-trends.

What carries the argument

MSE minimization over the choice of pre-trend length applied to conventional two-way fixed effects or event-study DiD specifications.

If this is right

  • The estimator can achieve lower MSE than arbitrary fixed pre-trend choices when sample sizes are small.
  • Pre-testing for parallel trends is replaced by direct optimization of estimation error.
  • The same selection procedure applies to both two-way fixed effects and event-study models.
  • Empirical applications can obtain more accurate treatment-effect estimates by using the data-driven pre-trend length.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could reduce arbitrary researcher choices in DiD design across many applied settings.
  • Extensions might adapt the MSE criterion to other causal estimators that face similar window-length decisions.
  • Further checks could test performance when pre-trends follow patterns not well captured by linear or fixed-length assumptions.
  • Policy evaluations with limited data might gain more reliable small-sample results from this selection rule.

Load-bearing premise

That the MSE can be reliably minimized from the observed data without introducing new selection bias or requiring extra assumptions about the shape of pre-trends.

What would settle it

Repeated simulations with known true treatment effects in which the proposed estimator produces higher MSE than a fixed pre-trend length choice would falsify the optimality claim.

Figures

Figures reproduced from arXiv: 2605.05056 by Yamato Igarashi.

Figure 1
Figure 1. Figure 1: Bias-variance tradeoff Notes: This figure explains the image of the bias and variance tradeoff generated by changing the length of pre-trends. The variance is larger when the pre-treatment period is shorter. On the other hand, the bias is smaller when the pre-treatment period is shorter. An opposite relationship holds when the pre-treatment period is longer. Then the optimal length of pre-trends would be o… view at source ↗
Figure 2
Figure 2. Figure 2: Different pre-trends length with static treatment effect view at source ↗
Figure 3
Figure 3. Figure 3: Event study plots with different models Notes: These figures represent the event study plots with three models: (A) the traditional event study model (3), (B) the modified event study model (7) with the full length of pre-trends, (C) the MSE-optimal event study model. The black points are the point estimates and the vertical lines are the 95% confidence intervals. The triangle points are the true values of… view at source ↗
Figure 4
Figure 4. Figure 4: Event Study Estimates of the Effect of the Early view at source ↗
read the original abstract

This paper develops a difference-in-differences (DiD) estimation method that selects the optimal length of pre-trends by minimizing the mean squared error (MSE). Conventional DiD regression models, such as the two-way fixed effects model or the event study model, may suffer from accuracy and validity concerns. If the sample size is small, the estimator may have a larger variance. Also, pre-tests often lack power to detect violations of the parallel trends assumption as Roth (2022) highlights. By focusing on the bias and variance tradeoff, the proposed method derives the MSE-optimal estimator from the optimal length of pre-trends. Simulation results and an empirical application demonstrate the practical applicability of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper proposes a difference-in-differences estimator that selects the optimal length of pre-trends by minimizing the mean squared error, explicitly balancing bias from possible parallel-trends violations against variance reduction. It contrasts the approach with conventional two-way fixed-effects and event-study specifications, and supports the proposal with simulation evidence and an empirical application.

Significance. If the derivation is valid, the method supplies a principled, data-driven rule for choosing pre-period length in DiD designs, which could improve finite-sample accuracy relative to arbitrary fixed lengths or low-powered pre-tests. The explicit bias-variance framing is a clear strength when the MSE objective can be minimized without introducing additional selection bias.

major comments (2)
  1. [Methodology section] The central derivation of the MSE-optimal pre-trend length (presumably in the methodology section) must demonstrate that the data-driven minimization of the MSE does not itself induce post-selection bias or circularity in the bias term; the reader's weakest assumption highlights that this step is load-bearing for the optimality claim.
  2. [Simulations] Simulation results (Section 4 or equivalent) report MSE improvements, but the design should include explicit comparisons against standard DiD estimators that use fixed pre-trend lengths chosen ex ante, to isolate whether the data-driven selection delivers gains beyond what a correctly specified fixed-length estimator would achieve.
minor comments (2)
  1. Notation for the pre-trend length parameter should be standardized and distinguished from event-study lag indices to prevent reader confusion.
  2. [Abstract] The abstract could state more precisely the set of candidate lengths over which the MSE is minimized and whether the minimization is performed in-sample or via cross-validation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen our manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [Methodology section] The central derivation of the MSE-optimal pre-trend length (presumably in the methodology section) must demonstrate that the data-driven minimization of the MSE does not itself induce post-selection bias or circularity in the bias term; the reader's weakest assumption highlights that this step is load-bearing for the optimality claim.

    Authors: We thank the referee for this important observation. In our derivation, the MSE is expressed as an explicit function of the pre-trend length k: bias squared (arising from any linear deviation from parallel trends over the pre-period) plus variance (which declines with larger k under standard DiD assumptions). The optimal k minimizes this expression, and the estimator itself is the conventional DiD estimator computed with the selected k. The bias term used for selection is obtained from the observed pre-period trends and does not depend on the post-period outcome or the final estimator, avoiding direct circularity. Nevertheless, because selection is data-driven, we acknowledge that a formal argument ruling out post-selection bias is valuable. We will add a dedicated subsection (or appendix) that derives the asymptotic properties of the selected estimator and shows that the selection step does not introduce additional bias beyond the intended bias-variance tradeoff under the paper's maintained assumptions. revision: yes

  2. Referee: [Simulations] Simulation results (Section 4 or equivalent) report MSE improvements, but the design should include explicit comparisons against standard DiD estimators that use fixed pre-trend lengths chosen ex ante, to isolate whether the data-driven selection delivers gains beyond what a correctly specified fixed-length estimator would achieve.

    Authors: We agree that direct comparisons to fixed-length estimators chosen ex ante would better isolate the value of the data-driven rule. Our existing simulations compare the MSE-optimal estimator to conventional two-way fixed-effects and event-study specifications (which implicitly use all available pre-periods). We will revise Section 4 to add explicit benchmarks that fix the pre-trend length ex ante at several reasonable values (e.g., half the available pre-periods, or other commonly used fractions). In addition, where the simulation design permits, we will include an oracle benchmark that uses the true MSE-minimizing length known from the DGP. These additions will clarify whether the data-driven selection improves upon well-specified fixed alternatives. revision: yes

Circularity Check

0 steps flagged

No significant circularity in MSE-optimal pre-trend length selection

full rationale

The paper proposes selecting pre-trend length to minimize estimated MSE as an explicit bias-variance tradeoff. This is a direct optimization step that does not reduce the final estimator to its inputs by construction, nor does it rely on self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations. The provided abstract and context describe a standard data-driven procedure consistent with external DiD assumptions, with no quoted steps showing equivalence to inputs. The central claim remains independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper builds on standard DiD assumptions including parallel trends and focuses on MSE minimization; no new entities are introduced.

pith-pipeline@v0.9.0 · 5403 in / 944 out tokens · 28005 ms · 2026-05-08T15:56:19.255352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references

  1. [1]

    Journal of Economic Literature , volume=

    Difference-in-Differences Designs: A Practitioner's Guide , author=. Journal of Economic Literature , volume=

  2. [2]

    Journal of Financial Economics , volume=

    How much should we trust staggered difference-in-differences estimates? , author=. Journal of Financial Economics , volume=

  3. [3]

    2023 , eprint=

    Difference-in-Differences Estimation with Spatial Spillovers , author=. 2023 , eprint=

  4. [4]

    Journal of Econometrics , volume=

    Difference-in-differences with multiple time periods , author=. Journal of Econometrics , volume=

  5. [5]

    Econometrica , volume =

    Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs , author =. Econometrica , volume =

  6. [6]

    American Economic Review , volume=

    Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania , author=. American Economic Review , volume=

  7. [7]

    American Economic Review , volume=

    Two-way fixed effects estimators with heterogeneous treatment effects , author=. American Economic Review , volume=

  8. [8]

    The Econometrics Journal , volume=

    Two-way fixed effects and differences-in-differences with heterogeneous treatment effects: A survey , author=. The Econometrics Journal , volume=

  9. [9]

    Political Analysis , volume=

    Using multiple pretreatment periods to improve difference-in-differences and staggered adoption designs , author=. Political Analysis , volume=

  10. [10]

    A simple approach to staggered difference-in-differences in the presence of spillovers , author=

  11. [11]

    American Economic Journal: Economic Policy , volume=

    Early retirement incentives and student achievement , author=. American Economic Journal: Economic Policy , volume=

  12. [12]

    Replication data for: Early Retirement Incentives and Student Achievement , author=

  13. [13]

    Journal of Econometrics , volume=

    Difference-in-differences with variation in treatment timing , author=. Journal of Econometrics , volume=

  14. [14]

    Econometrics , author=

  15. [15]

    2026 , archivePrefix=

    Estimating Treatment Effects in Panel Data Without Parallel Trends , author=. 2026 , archivePrefix=

  16. [16]

    The Review of Economic Studies , volume=

    Optimal bandwidth choice for the regression discontinuity estimator , author=. The Review of Economic Studies , volume=

  17. [17]

    The Review of Economic Studies , volume=

    A more credible approach to parallel trends , author=. The Review of Economic Studies , volume=

  18. [18]

    American Economic Review: Insights , volume=

    Pretest with caution: Event-study estimates after testing for parallel trends , author=. American Economic Review: Insights , volume=

  19. [19]

    Journal of Econometrics , volume=

    What's trending in difference-in-differences? A synthesis of the recent econometrics literature , author=. Journal of Econometrics , volume=

  20. [20]

    Journal of Econometrics , volume=

    Estimating dynamic treatment effects in event studies with heterogeneous treatment effects , author=. Journal of Econometrics , volume=