Recognition: unknown
MSE-Optimal Difference-in-Differences Estimator
Pith reviewed 2026-05-08 15:56 UTC · model grok-4.3
The pith
A difference-in-differences estimator chooses the pre-trend length that minimizes mean squared error.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper develops a difference-in-differences estimation method that selects the optimal length of pre-trends by minimizing the mean squared error (MSE). By focusing on the bias and variance tradeoff, the proposed method derives the MSE-optimal estimator from the optimal length of pre-trends.
What carries the argument
MSE minimization over the choice of pre-trend length applied to conventional two-way fixed effects or event-study DiD specifications.
If this is right
- The estimator can achieve lower MSE than arbitrary fixed pre-trend choices when sample sizes are small.
- Pre-testing for parallel trends is replaced by direct optimization of estimation error.
- The same selection procedure applies to both two-way fixed effects and event-study models.
- Empirical applications can obtain more accurate treatment-effect estimates by using the data-driven pre-trend length.
Where Pith is reading between the lines
- The method could reduce arbitrary researcher choices in DiD design across many applied settings.
- Extensions might adapt the MSE criterion to other causal estimators that face similar window-length decisions.
- Further checks could test performance when pre-trends follow patterns not well captured by linear or fixed-length assumptions.
- Policy evaluations with limited data might gain more reliable small-sample results from this selection rule.
Load-bearing premise
That the MSE can be reliably minimized from the observed data without introducing new selection bias or requiring extra assumptions about the shape of pre-trends.
What would settle it
Repeated simulations with known true treatment effects in which the proposed estimator produces higher MSE than a fixed pre-trend length choice would falsify the optimality claim.
Figures
read the original abstract
This paper develops a difference-in-differences (DiD) estimation method that selects the optimal length of pre-trends by minimizing the mean squared error (MSE). Conventional DiD regression models, such as the two-way fixed effects model or the event study model, may suffer from accuracy and validity concerns. If the sample size is small, the estimator may have a larger variance. Also, pre-tests often lack power to detect violations of the parallel trends assumption as Roth (2022) highlights. By focusing on the bias and variance tradeoff, the proposed method derives the MSE-optimal estimator from the optimal length of pre-trends. Simulation results and an empirical application demonstrate the practical applicability of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper proposes a difference-in-differences estimator that selects the optimal length of pre-trends by minimizing the mean squared error, explicitly balancing bias from possible parallel-trends violations against variance reduction. It contrasts the approach with conventional two-way fixed-effects and event-study specifications, and supports the proposal with simulation evidence and an empirical application.
Significance. If the derivation is valid, the method supplies a principled, data-driven rule for choosing pre-period length in DiD designs, which could improve finite-sample accuracy relative to arbitrary fixed lengths or low-powered pre-tests. The explicit bias-variance framing is a clear strength when the MSE objective can be minimized without introducing additional selection bias.
major comments (2)
- [Methodology section] The central derivation of the MSE-optimal pre-trend length (presumably in the methodology section) must demonstrate that the data-driven minimization of the MSE does not itself induce post-selection bias or circularity in the bias term; the reader's weakest assumption highlights that this step is load-bearing for the optimality claim.
- [Simulations] Simulation results (Section 4 or equivalent) report MSE improvements, but the design should include explicit comparisons against standard DiD estimators that use fixed pre-trend lengths chosen ex ante, to isolate whether the data-driven selection delivers gains beyond what a correctly specified fixed-length estimator would achieve.
minor comments (2)
- Notation for the pre-trend length parameter should be standardized and distinguished from event-study lag indices to prevent reader confusion.
- [Abstract] The abstract could state more precisely the set of candidate lengths over which the MSE is minimized and whether the minimization is performed in-sample or via cross-validation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the opportunity to clarify and strengthen our manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [Methodology section] The central derivation of the MSE-optimal pre-trend length (presumably in the methodology section) must demonstrate that the data-driven minimization of the MSE does not itself induce post-selection bias or circularity in the bias term; the reader's weakest assumption highlights that this step is load-bearing for the optimality claim.
Authors: We thank the referee for this important observation. In our derivation, the MSE is expressed as an explicit function of the pre-trend length k: bias squared (arising from any linear deviation from parallel trends over the pre-period) plus variance (which declines with larger k under standard DiD assumptions). The optimal k minimizes this expression, and the estimator itself is the conventional DiD estimator computed with the selected k. The bias term used for selection is obtained from the observed pre-period trends and does not depend on the post-period outcome or the final estimator, avoiding direct circularity. Nevertheless, because selection is data-driven, we acknowledge that a formal argument ruling out post-selection bias is valuable. We will add a dedicated subsection (or appendix) that derives the asymptotic properties of the selected estimator and shows that the selection step does not introduce additional bias beyond the intended bias-variance tradeoff under the paper's maintained assumptions. revision: yes
-
Referee: [Simulations] Simulation results (Section 4 or equivalent) report MSE improvements, but the design should include explicit comparisons against standard DiD estimators that use fixed pre-trend lengths chosen ex ante, to isolate whether the data-driven selection delivers gains beyond what a correctly specified fixed-length estimator would achieve.
Authors: We agree that direct comparisons to fixed-length estimators chosen ex ante would better isolate the value of the data-driven rule. Our existing simulations compare the MSE-optimal estimator to conventional two-way fixed-effects and event-study specifications (which implicitly use all available pre-periods). We will revise Section 4 to add explicit benchmarks that fix the pre-trend length ex ante at several reasonable values (e.g., half the available pre-periods, or other commonly used fractions). In addition, where the simulation design permits, we will include an oracle benchmark that uses the true MSE-minimizing length known from the DGP. These additions will clarify whether the data-driven selection improves upon well-specified fixed alternatives. revision: yes
Circularity Check
No significant circularity in MSE-optimal pre-trend length selection
full rationale
The paper proposes selecting pre-trend length to minimize estimated MSE as an explicit bias-variance tradeoff. This is a direct optimization step that does not reduce the final estimator to its inputs by construction, nor does it rely on self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations. The provided abstract and context describe a standard data-driven procedure consistent with external DiD assumptions, with no quoted steps showing equivalence to inputs. The central claim remains independent content.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of Economic Literature , volume=
Difference-in-Differences Designs: A Practitioner's Guide , author=. Journal of Economic Literature , volume=
-
[2]
Journal of Financial Economics , volume=
How much should we trust staggered difference-in-differences estimates? , author=. Journal of Financial Economics , volume=
-
[3]
2023 , eprint=
Difference-in-Differences Estimation with Spatial Spillovers , author=. 2023 , eprint=
2023
-
[4]
Journal of Econometrics , volume=
Difference-in-differences with multiple time periods , author=. Journal of Econometrics , volume=
-
[5]
Econometrica , volume =
Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs , author =. Econometrica , volume =
-
[6]
American Economic Review , volume=
Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania , author=. American Economic Review , volume=
-
[7]
American Economic Review , volume=
Two-way fixed effects estimators with heterogeneous treatment effects , author=. American Economic Review , volume=
-
[8]
The Econometrics Journal , volume=
Two-way fixed effects and differences-in-differences with heterogeneous treatment effects: A survey , author=. The Econometrics Journal , volume=
-
[9]
Political Analysis , volume=
Using multiple pretreatment periods to improve difference-in-differences and staggered adoption designs , author=. Political Analysis , volume=
-
[10]
A simple approach to staggered difference-in-differences in the presence of spillovers , author=
-
[11]
American Economic Journal: Economic Policy , volume=
Early retirement incentives and student achievement , author=. American Economic Journal: Economic Policy , volume=
-
[12]
Replication data for: Early Retirement Incentives and Student Achievement , author=
-
[13]
Journal of Econometrics , volume=
Difference-in-differences with variation in treatment timing , author=. Journal of Econometrics , volume=
-
[14]
Econometrics , author=
-
[15]
2026 , archivePrefix=
Estimating Treatment Effects in Panel Data Without Parallel Trends , author=. 2026 , archivePrefix=
2026
-
[16]
The Review of Economic Studies , volume=
Optimal bandwidth choice for the regression discontinuity estimator , author=. The Review of Economic Studies , volume=
-
[17]
The Review of Economic Studies , volume=
A more credible approach to parallel trends , author=. The Review of Economic Studies , volume=
-
[18]
American Economic Review: Insights , volume=
Pretest with caution: Event-study estimates after testing for parallel trends , author=. American Economic Review: Insights , volume=
-
[19]
Journal of Econometrics , volume=
What's trending in difference-in-differences? A synthesis of the recent econometrics literature , author=. Journal of Econometrics , volume=
-
[20]
Journal of Econometrics , volume=
Estimating dynamic treatment effects in event studies with heterogeneous treatment effects , author=. Journal of Econometrics , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.