arxiv: 2605.05056 · v1 · submitted 2026-05-06 · 💰 econ.EM

Recognition: unknown

MSE-Optimal Difference-in-Differences Estimator

Yamato Igarashi

Pith reviewed 2026-05-08 15:56 UTC · model grok-4.3

classification 💰 econ.EM

keywords difference-in-differencesmean squared errorpre-trendsbias-variance tradeoffevent studytwo-way fixed effectsoptimal estimator

0 comments

The pith

A difference-in-differences estimator chooses the pre-trend length that minimizes mean squared error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard DiD regressions can produce high-variance estimates in small samples or invalid results when parallel trends fail. Pre-tests for trend violations often lack power. The paper instead selects the number of pre-periods by minimizing the estimator's mean squared error, which directly balances the bias from longer windows against the variance from shorter ones. A reader would care because the resulting estimator aims for lower overall error without separate pre-testing steps. Simulations and one real-data example illustrate how the approach works in practice.

Core claim

The paper develops a difference-in-differences estimation method that selects the optimal length of pre-trends by minimizing the mean squared error (MSE). By focusing on the bias and variance tradeoff, the proposed method derives the MSE-optimal estimator from the optimal length of pre-trends.

What carries the argument

MSE minimization over the choice of pre-trend length applied to conventional two-way fixed effects or event-study DiD specifications.

If this is right

The estimator can achieve lower MSE than arbitrary fixed pre-trend choices when sample sizes are small.
Pre-testing for parallel trends is replaced by direct optimization of estimation error.
The same selection procedure applies to both two-way fixed effects and event-study models.
Empirical applications can obtain more accurate treatment-effect estimates by using the data-driven pre-trend length.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could reduce arbitrary researcher choices in DiD design across many applied settings.
Extensions might adapt the MSE criterion to other causal estimators that face similar window-length decisions.
Further checks could test performance when pre-trends follow patterns not well captured by linear or fixed-length assumptions.
Policy evaluations with limited data might gain more reliable small-sample results from this selection rule.

Load-bearing premise

That the MSE can be reliably minimized from the observed data without introducing new selection bias or requiring extra assumptions about the shape of pre-trends.

What would settle it

Repeated simulations with known true treatment effects in which the proposed estimator produces higher MSE than a fixed pre-trend length choice would falsify the optimality claim.

Figures

Figures reproduced from arXiv: 2605.05056 by Yamato Igarashi.

**Figure 1.** Figure 1: Bias-variance tradeoff Notes: This figure explains the image of the bias and variance tradeoff generated by changing the length of pre-trends. The variance is larger when the pre-treatment period is shorter. On the other hand, the bias is smaller when the pre-treatment period is shorter. An opposite relationship holds when the pre-treatment period is longer. Then the optimal length of pre-trends would be o… view at source ↗

**Figure 2.** Figure 2: Different pre-trends length with static treatment effect view at source ↗

**Figure 3.** Figure 3: Event study plots with different models Notes: These figures represent the event study plots with three models: (A) the traditional event study model (3), (B) the modified event study model (7) with the full length of pre-trends, (C) the MSE-optimal event study model. The black points are the point estimates and the vertical lines are the 95% confidence intervals. The triangle points are the true values of… view at source ↗

**Figure 4.** Figure 4: Event Study Estimates of the Effect of the Early view at source ↗

read the original abstract

This paper develops a difference-in-differences (DiD) estimation method that selects the optimal length of pre-trends by minimizing the mean squared error (MSE). Conventional DiD regression models, such as the two-way fixed effects model or the event study model, may suffer from accuracy and validity concerns. If the sample size is small, the estimator may have a larger variance. Also, pre-tests often lack power to detect violations of the parallel trends assumption as Roth (2022) highlights. By focusing on the bias and variance tradeoff, the proposed method derives the MSE-optimal estimator from the optimal length of pre-trends. Simulation results and an empirical application demonstrate the practical applicability of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a data-driven rule for picking pre-trend length in DiD by minimizing MSE, which is a straightforward idea but rests on thin evidence in the abstract.

read the letter

The main contribution is a method that chooses the number of pre-periods in a DiD estimator to minimize mean squared error. It trades off bias from including periods that violate parallel trends against the variance cost of using fewer periods. This is a direct response to the low power of pre-tests that Roth (2022) documented and to the variance problems that arise in small samples with standard two-way fixed effects or event-study specifications. The abstract says the estimator is derived from this explicit bias-variance calculation and that simulations plus one empirical example support it. That framing is new relative to the usual DiD literature cited, and it gives applied researchers a concrete, automatic rule instead of ad-hoc choices or underpowered tests. The approach stays within standard DiD assumptions once the selection step is treated as part of the estimator. The soft spots are mostly about execution. Without the derivations it is unclear exactly how the MSE is estimated from the data or whether the minimization step itself adds finite-sample bias that the variance formula does not capture. In small samples the estimated MSE could be noisy, and any post-selection adjustment might affect coverage or bias in ways the simulations need to demonstrate explicitly. The empirical application is mentioned but not described, so it is hard to judge how large the practical gains are. This is the kind of paper that empirical economists who run DiD on modest datasets would want to see. It is a modest but useful tweak rather than a fundamental fix, and the proposal is clear enough to deserve referee time. I would send it out for review with the expectation that the authors will have to show the math and run more targeted checks on the selection bias.

Referee Report

2 major / 2 minor

Summary. This paper proposes a difference-in-differences estimator that selects the optimal length of pre-trends by minimizing the mean squared error, explicitly balancing bias from possible parallel-trends violations against variance reduction. It contrasts the approach with conventional two-way fixed-effects and event-study specifications, and supports the proposal with simulation evidence and an empirical application.

Significance. If the derivation is valid, the method supplies a principled, data-driven rule for choosing pre-period length in DiD designs, which could improve finite-sample accuracy relative to arbitrary fixed lengths or low-powered pre-tests. The explicit bias-variance framing is a clear strength when the MSE objective can be minimized without introducing additional selection bias.

major comments (2)

[Methodology section] The central derivation of the MSE-optimal pre-trend length (presumably in the methodology section) must demonstrate that the data-driven minimization of the MSE does not itself induce post-selection bias or circularity in the bias term; the reader's weakest assumption highlights that this step is load-bearing for the optimality claim.
[Simulations] Simulation results (Section 4 or equivalent) report MSE improvements, but the design should include explicit comparisons against standard DiD estimators that use fixed pre-trend lengths chosen ex ante, to isolate whether the data-driven selection delivers gains beyond what a correctly specified fixed-length estimator would achieve.

minor comments (2)

Notation for the pre-trend length parameter should be standardized and distinguished from event-study lag indices to prevent reader confusion.
[Abstract] The abstract could state more precisely the set of candidate lengths over which the MSE is minimized and whether the minimization is performed in-sample or via cross-validation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen our manuscript. We address each major comment below.

read point-by-point responses

Referee: [Methodology section] The central derivation of the MSE-optimal pre-trend length (presumably in the methodology section) must demonstrate that the data-driven minimization of the MSE does not itself induce post-selection bias or circularity in the bias term; the reader's weakest assumption highlights that this step is load-bearing for the optimality claim.

Authors: We thank the referee for this important observation. In our derivation, the MSE is expressed as an explicit function of the pre-trend length k: bias squared (arising from any linear deviation from parallel trends over the pre-period) plus variance (which declines with larger k under standard DiD assumptions). The optimal k minimizes this expression, and the estimator itself is the conventional DiD estimator computed with the selected k. The bias term used for selection is obtained from the observed pre-period trends and does not depend on the post-period outcome or the final estimator, avoiding direct circularity. Nevertheless, because selection is data-driven, we acknowledge that a formal argument ruling out post-selection bias is valuable. We will add a dedicated subsection (or appendix) that derives the asymptotic properties of the selected estimator and shows that the selection step does not introduce additional bias beyond the intended bias-variance tradeoff under the paper's maintained assumptions. revision: yes
Referee: [Simulations] Simulation results (Section 4 or equivalent) report MSE improvements, but the design should include explicit comparisons against standard DiD estimators that use fixed pre-trend lengths chosen ex ante, to isolate whether the data-driven selection delivers gains beyond what a correctly specified fixed-length estimator would achieve.

Authors: We agree that direct comparisons to fixed-length estimators chosen ex ante would better isolate the value of the data-driven rule. Our existing simulations compare the MSE-optimal estimator to conventional two-way fixed-effects and event-study specifications (which implicitly use all available pre-periods). We will revise Section 4 to add explicit benchmarks that fix the pre-trend length ex ante at several reasonable values (e.g., half the available pre-periods, or other commonly used fractions). In addition, where the simulation design permits, we will include an oracle benchmark that uses the true MSE-minimizing length known from the DGP. These additions will clarify whether the data-driven selection improves upon well-specified fixed alternatives. revision: yes

Circularity Check

0 steps flagged

No significant circularity in MSE-optimal pre-trend length selection

full rationale

The paper proposes selecting pre-trend length to minimize estimated MSE as an explicit bias-variance tradeoff. This is a direct optimization step that does not reduce the final estimator to its inputs by construction, nor does it rely on self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations. The provided abstract and context describe a standard data-driven procedure consistent with external DiD assumptions, with no quoted steps showing equivalence to inputs. The central claim remains independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper builds on standard DiD assumptions including parallel trends and focuses on MSE minimization; no new entities are introduced.

pith-pipeline@v0.9.0 · 5403 in / 944 out tokens · 28005 ms · 2026-05-08T15:56:19.255352+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references

[1]

Journal of Economic Literature , volume=

Difference-in-Differences Designs: A Practitioner's Guide , author=. Journal of Economic Literature , volume=
[2]

Journal of Financial Economics , volume=

How much should we trust staggered difference-in-differences estimates? , author=. Journal of Financial Economics , volume=
[3]

2023 , eprint=

Difference-in-Differences Estimation with Spatial Spillovers , author=. 2023 , eprint=

2023
[4]

Journal of Econometrics , volume=

Difference-in-differences with multiple time periods , author=. Journal of Econometrics , volume=
[5]

Econometrica , volume =

Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs , author =. Econometrica , volume =
[6]

American Economic Review , volume=

Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania , author=. American Economic Review , volume=
[7]

American Economic Review , volume=

Two-way fixed effects estimators with heterogeneous treatment effects , author=. American Economic Review , volume=
[8]

The Econometrics Journal , volume=

Two-way fixed effects and differences-in-differences with heterogeneous treatment effects: A survey , author=. The Econometrics Journal , volume=
[9]

Political Analysis , volume=

Using multiple pretreatment periods to improve difference-in-differences and staggered adoption designs , author=. Political Analysis , volume=
[10]

A simple approach to staggered difference-in-differences in the presence of spillovers , author=
[11]

American Economic Journal: Economic Policy , volume=

Early retirement incentives and student achievement , author=. American Economic Journal: Economic Policy , volume=
[12]

Replication data for: Early Retirement Incentives and Student Achievement , author=
[13]

Journal of Econometrics , volume=

Difference-in-differences with variation in treatment timing , author=. Journal of Econometrics , volume=
[14]

Econometrics , author=
[15]

2026 , archivePrefix=

Estimating Treatment Effects in Panel Data Without Parallel Trends , author=. 2026 , archivePrefix=

2026
[16]

The Review of Economic Studies , volume=

Optimal bandwidth choice for the regression discontinuity estimator , author=. The Review of Economic Studies , volume=
[17]

The Review of Economic Studies , volume=

A more credible approach to parallel trends , author=. The Review of Economic Studies , volume=
[18]

American Economic Review: Insights , volume=

Pretest with caution: Event-study estimates after testing for parallel trends , author=. American Economic Review: Insights , volume=
[19]

Journal of Econometrics , volume=

What's trending in difference-in-differences? A synthesis of the recent econometrics literature , author=. Journal of Econometrics , volume=
[20]

Journal of Econometrics , volume=

Estimating dynamic treatment effects in event studies with heterogeneous treatment effects , author=. Journal of Econometrics , volume=