pith. machine review for the scientific record. sign in

arxiv: 2604.12611 · v4 · submitted 2026-04-14 · 💰 econ.EM

Recognition: unknown

Distributional Change in Ordinal Data with Missing Observations: Minimal Mobility and Partial Identification

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:55 UTC · model grok-4.3

classification 💰 econ.EM
keywords ordinal datadistributional changepartial identificationmissing observationsoptimal transportL1 distanceminimal mobilityrepeated cross-sections
0
0 comments X

The pith

The L1 distance between cumulative distribution functions represents the minimal reallocation of probability mass across ordered categories, yielding a scalar measure and minimal-mobility configurations of distributional change that can be,

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework to measure and interpret changes in distributions of ordinal outcomes drawn from repeated cross-sectional surveys. The joint distribution of outcomes across periods remains unobserved, so sources of change cannot be identified directly. It shows that the L1 distance between cumulative distribution functions admits an optimal transport representation as the smallest possible movement of probability mass between ordered categories. This representation supplies both a scalar discrepancy measure and a precise description of the change patterns that require the least such movement, called minimal-mobility configurations. Partial identification then produces sharp bounds on the marginal distributions and therefore on the measure and configurations, using only the observed marginal information and supporting standard inference plus sensitivity checks for nonresponse.

Core claim

The L1 distance between cumulative distribution functions admits an optimal transport representation as the minimal reallocation of probability mass across ordered categories. This yields both a scalar measure of discrepancy and a structured characterization of how distributional change must occur, which the paper terms minimal-mobility configurations. To address missing data, a partial identification approach delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations.

What carries the argument

The optimal transport representation of the L1 distance between cumulative distribution functions, which characterizes minimal-mobility configurations of distributional change.

If this is right

  • A scalar measure of distributional discrepancy is obtained for ordinal data under limited information.
  • Minimal-mobility configurations describe the structure any observed change must satisfy.
  • Sharp bounds on both the measure and configurations are available despite missing observations.
  • Inference on the bounds proceeds with standard resampling methods.
  • Sensitivity of the results to nonresponse can be assessed directly from the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Comparing observed changes to the minimal-mobility configurations would indicate how much additional reallocation beyond the minimum is present.
  • The same representation could be used to bound mobility measures when only repeated cross-sections rather than true panels are available.
  • Additional identifying assumptions on response behavior could be layered on top to narrow the bounds further in specific applications.

Load-bearing premise

The partial identification approach based on observed marginal information from repeated cross-sections produces sharp bounds on the marginal distributions and thereby on the discrepancy measure and minimal-mobility configurations.

What would settle it

Panel data that reveals joint distributions lying outside the partial identification bounds computed from the corresponding repeated cross-sectional marginals would show that the bounds are not sharp.

Figures

Figures reproduced from arXiv: 2604.12611 by Rami V. Tabri.

Figure 1
Figure 1. Figure 1: Illustration of two distinct optimal couplings wi [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of representative endpoint-condit [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative maximal-mobility coupling for th [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Worst-case CDF bounds 8, respectively, and for Morocco they are 1227 and 1152. Reported response rates (AAPOR Re￾sponse Rate 1) vary across countries and waves: they are 77% and 58% for Iraq and 38% and 69% for Morocco in Waves 7 and 8, respectively. Item nonresponse rates for this question are low: for Iraq they are 0.85% and 0.25%, and for Morocco 1.39% and 0.26% across the two waves. The framework devel… view at source ↗
Figure 5
Figure 5. Figure 5: Confidence sets of endpoint-conditioned coupling [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
read the original abstract

Empirical analyses of ordinal outcomes using repeated cross-sectional data rely on marginal distributions, leaving the joint distribution unobserved and the sources of distributional change unidentified. This paper develops a framework to measure and interpret such changes under limited information. The $L_1$ distance between cumulative distribution functions admits an optimal transport representation as the minimal reallocation of probability mass across ordered categories, which provides a foundation for the analysis. This yields both a scalar measure of discrepancy and a structured characterization of how distributional change must occur, which I term minimal-mobility configurations. To address missing data, I adopt a partial identification approach that delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations. The resulting framework supports inference using standard resampling methods and provides a transparent basis for assessing sensitivity to nonresponse. An application to Arab Barometer data illustrates the approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper develops a framework for measuring and interpreting distributional change in ordinal outcomes from repeated cross-sections with missing observations. It shows that the L1 distance between CDFs equals the minimal reallocation of probability mass under the natural ordering (via optimal transport), yielding both a scalar discrepancy measure and a characterization of 'minimal-mobility configurations'. A partial-identification strategy then supplies sharp bounds on the marginal distributions and, in turn, on the measure and configurations; the framework supports resampling-based inference and sensitivity analysis to nonresponse, illustrated with Arab Barometer data.

Significance. If the sharpness claims hold, the work offers a transparent, interpretable way to bound and decompose ordinal distributional change when only marginals are partially identified. It usefully combines a standard OT representation (W1 distance on ordered support) with partial-ID techniques to address a common data limitation in survey research. The application demonstrates feasibility, and the emphasis on configurations provides more than a scalar summary. Credit is due for the clean link between the transport representation and the mobility interpretation.

major comments (1)
  1. [Abstract] Abstract: the statement that partial identification 'delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations' requires explicit justification that the joint identification region for the pair of marginals (F,G) is rectangular or that the extremal minimal-mobility plans are attained at the vertices of the separate marginal regions. If the missingness mechanism creates dependence across periods, the feasible set of transport plans inside the joint region may be strictly smaller than the product of the marginal bounds; the manuscript must characterize this joint region and recompute the configurations within it to confirm sharpness.
minor comments (2)
  1. The notation for minimal-mobility configurations and the associated transport plans should be introduced with a small numerical example early in the text to aid readability.
  2. Clarify whether the resampling procedure for inference accounts for the partial-identification step or treats the bounds as fixed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and positive report. The single major comment raises an important technical point about the joint identification region, which we address directly below. We will revise the manuscript to incorporate the requested clarification.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that partial identification 'delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations' requires explicit justification that the joint identification region for the pair of marginals (F,G) is rectangular or that the extremal minimal-mobility plans are attained at the vertices of the separate marginal regions. If the missingness mechanism creates dependence across periods, the feasible set of transport plans inside the joint region may be strictly smaller than the product of the marginal bounds; the manuscript must characterize this joint region and recompute the configurations within it to confirm sharpness.

    Authors: We appreciate the referee's careful attention to the sharpness claim. The framework is developed for repeated cross-sectional surveys, where the two periods are sampled independently. The partial-identification analysis therefore treats the missingness process in each period separately, with no cross-period linkage in the sampling design or in the maintained assumptions on the nonresponse mechanism. As a result, the identification regions for the two marginal distributions F and G are independent, so that the joint identification region for the pair (F, G) is exactly the Cartesian product of the two marginal regions. Because the L1 discrepancy is a continuous functional of the pair of CDFs alone, its sharp bounds are obtained by optimizing over this product set. The associated minimal-mobility configurations are the optimal transport plans for the extremal pairs (F, G) lying on the boundary of the product region; these extrema are therefore attained at combinations of the vertices of the separate marginal identification sets. We will add an explicit paragraph to the abstract and to Section 3 (Partial Identification) stating the independence of the two cross-sections and confirming that the joint region is rectangular under the maintained sampling assumptions. A short remark will also be added noting that the rectangularity would not hold in a genuine panel with attrition, but that case lies outside the repeated-cross-section setting of the paper. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation builds on external OT representation and partial-ID bounds without self-referential reduction

full rationale

The paper's core step equates the L1 distance on CDFs to the minimal reallocation distance via the standard optimal-transport representation for ordered support (a known result, not derived internally). Minimal-mobility configurations are then defined directly from this representation. Partial identification is invoked to bound the marginal distributions, with the discrepancy measure and configurations bounded 'in turn' as a consequence. No equation reduces the target quantities to fitted parameters by construction, no load-bearing premise rests on a self-citation chain, and no ansatz is smuggled via prior work by the same author. The framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on the abstract, which provides no explicit free parameters, invented entities, or detailed axioms beyond standard domain assumptions in the field.

axioms (1)
  • domain assumption Categories are ordered and the ordering is known and fixed.
    Required for the cumulative distribution function and L1 distance to admit an optimal transport interpretation as minimal reallocation.

pith-pipeline@v0.9.0 · 5443 in / 1279 out tokens · 65438 ms · 2026-05-10T13:55:20.767272+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Bickel, P . J. and D. A. Freedman (1981, 11). Some asymptotic t heory for the bootstrap. Ann. Statist. 9(6), 1196–1217

  2. [2]

    Hong, and E

    Chernozhukov, V ., H. Hong, and E. Tamer (2007). Estimation and confidence regions for parameter sets in econometric models. Econometrica 75(5), 1243–1284

  3. [3]

    Daljord, O. y., G. Pouliot, J. Xiao, and M. Hu (2026). The blac k market for beijing license plates. Working paper, Rice University. D’Haultfoeuille, X., C. Gaillac, and A. Maurel (2024). Line ar regressions with combined data

  4. [4]

    Galichon, and Y

    Dupuy, A., A. Galichon, and Y . Sun (2019, 12). Estimating mat ching affinity matrices under low-rank constraints. Information and Inference: A Journal of the IMA 8 (4), 677–689

  5. [5]

    Makdissi, W

    Fakih, A., P . Makdissi, W. Marrouch, R. V . Tabri, and M. Y azbeck (2022). A stochastic dominance test under survey nonresponse with an application to compar ing trust levels in lebanese public institutions. Journal of Econometrics 228 (2), 342–358. Fréchet, M. (1935). Généralisations du théorème des probab ilités totales. Fundamenta Mathemat- icae 25, 3...

  6. [6]

    Galichon, A. (2016). Optimal Transport Methods in Economics . Princeton University Press

  7. [7]

    Galichon, A. and M. Henry (2011, 04). Set Identification in Mo dels with Multiple Equilibria. The Review of Economic Studies 78 (4), 1264–1298

  8. [8]

    Galichon, A. and M. Henry (2026). An econometrician’s guide to optimal transport. Technical Report arXiv:2604.04227, arXiv. Working paper

  9. [9]

    Galichon, A. and B. Salanié (2022). Cupid’s invisible hand: Social surplus and identification in matching models. The Review of Economic Studies 89 (5), 2600–2629

  10. [10]

    Horowitz, J. L. and C. F. Manski (1995). Identification and ro bustness with contaminated and corrupted data. Econometrica 63(2), 281–302

  11. [11]

    Horowitz, J. L. and C. F. Manski (2000). Nonparametric analy sis of randomized experiments with missing covariate and outcome data. Journal of the American Statistical Association 95 (449), 77–84. 25

  12. [12]

    Jenkins, S. P . (2020, March). Comparing distributions of or dinal data. IZA Discussion Paper No.13057

  13. [13]

    Manski, C. F. (2005). Partial identification with missing da ta: concepts and findings. International Journal of Approximate Reasoning 39 (2-3), 151–165

  14. [14]

    Molinari, F. (2020). Microeconometrics with partial ident ification. In S. N. Durlauf, L. P . Hansen, J. J. Heckman, and R. L. Matzkin (Eds.), Handbook of Econometrics, V olume 7A, pp. 355–486. Elsevier

  15. [15]

    Schennach, S. M. and V . Starck (2026). Optimally transporte d generalized method of moments. Econometrica. Forthcoming

  16. [16]

    Shorrocks, A. F. (1978). The measurement of mobility. Econometrica 46(5), 1013–1024

  17. [17]

    Sunada, K. and K. Izumi (2025). Optimal treatment assignmen t rules under capacity constraints. V allender, S. S. (1974). Calculation of the Wasserstein dis tance between probability distributions on the line. Theory of Probability & Its Applications 18 (4), 784–786

  18. [18]

    Villani, C. (2009). Optimal Transport: Old and New . Grundlehren der mathematischen Wis- senschaften. Springer Berlin Heidelberg. A Proofs of Results A.1 Proposition 1 Proof. The result follows from the one-dimensional characterizat ion of the Wasserstein-1 distance as the L1 distance between cumulative distribution functions descr ibed in V allender (197...