arxiv: 2604.12611 · v4 · submitted 2026-04-14 · 💰 econ.EM

Recognition: unknown

Distributional Change in Ordinal Data with Missing Observations: Minimal Mobility and Partial Identification

Rami V. Tabri

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:55 UTC · model grok-4.3

classification 💰 econ.EM

keywords ordinal datadistributional changepartial identificationmissing observationsoptimal transportL1 distanceminimal mobilityrepeated cross-sections

0 comments

The pith

The L1 distance between cumulative distribution functions represents the minimal reallocation of probability mass across ordered categories, yielding a scalar measure and minimal-mobility configurations of distributional change that can be,

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework to measure and interpret changes in distributions of ordinal outcomes drawn from repeated cross-sectional surveys. The joint distribution of outcomes across periods remains unobserved, so sources of change cannot be identified directly. It shows that the L1 distance between cumulative distribution functions admits an optimal transport representation as the smallest possible movement of probability mass between ordered categories. This representation supplies both a scalar discrepancy measure and a precise description of the change patterns that require the least such movement, called minimal-mobility configurations. Partial identification then produces sharp bounds on the marginal distributions and therefore on the measure and configurations, using only the observed marginal information and supporting standard inference plus sensitivity checks for nonresponse.

Core claim

The L1 distance between cumulative distribution functions admits an optimal transport representation as the minimal reallocation of probability mass across ordered categories. This yields both a scalar measure of discrepancy and a structured characterization of how distributional change must occur, which the paper terms minimal-mobility configurations. To address missing data, a partial identification approach delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations.

What carries the argument

The optimal transport representation of the L1 distance between cumulative distribution functions, which characterizes minimal-mobility configurations of distributional change.

If this is right

A scalar measure of distributional discrepancy is obtained for ordinal data under limited information.
Minimal-mobility configurations describe the structure any observed change must satisfy.
Sharp bounds on both the measure and configurations are available despite missing observations.
Inference on the bounds proceeds with standard resampling methods.
Sensitivity of the results to nonresponse can be assessed directly from the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Comparing observed changes to the minimal-mobility configurations would indicate how much additional reallocation beyond the minimum is present.
The same representation could be used to bound mobility measures when only repeated cross-sections rather than true panels are available.
Additional identifying assumptions on response behavior could be layered on top to narrow the bounds further in specific applications.

Load-bearing premise

The partial identification approach based on observed marginal information from repeated cross-sections produces sharp bounds on the marginal distributions and thereby on the discrepancy measure and minimal-mobility configurations.

What would settle it

Panel data that reveals joint distributions lying outside the partial identification bounds computed from the corresponding repeated cross-sectional marginals would show that the bounds are not sharp.

Figures

Figures reproduced from arXiv: 2604.12611 by Rami V. Tabri.

**Figure 2.** Figure 2: Illustration of representative endpoint-condit [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Representative maximal-mobility coupling for th [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Worst-case CDF bounds 8, respectively, and for Morocco they are 1227 and 1152. Reported response rates (AAPOR Response Rate 1) vary across countries and waves: they are 77% and 58% for Iraq and 38% and 69% for Morocco in Waves 7 and 8, respectively. Item nonresponse rates for this question are low: for Iraq they are 0.85% and 0.25%, and for Morocco 1.39% and 0.26% across the two waves. The framework devel… view at source ↗

**Figure 5.** Figure 5: Confidence sets of endpoint-conditioned coupling [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

read the original abstract

Empirical analyses of ordinal outcomes using repeated cross-sectional data rely on marginal distributions, leaving the joint distribution unobserved and the sources of distributional change unidentified. This paper develops a framework to measure and interpret such changes under limited information. The $L_1$ distance between cumulative distribution functions admits an optimal transport representation as the minimal reallocation of probability mass across ordered categories, which provides a foundation for the analysis. This yields both a scalar measure of discrepancy and a structured characterization of how distributional change must occur, which I term minimal-mobility configurations. To address missing data, I adopt a partial identification approach that delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations. The resulting framework supports inference using standard resampling methods and provides a transparent basis for assessing sensitivity to nonresponse. An application to Arab Barometer data illustrates the approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean OT framing of ordinal distributional change as minimal mass reallocation and then bounds the resulting scalar and configurations under partial ID for missing data in repeated cross-sections.

read the letter

The core contribution is the use of the L1 distance on CDFs as an optimal transport distance that equals the minimal reallocation of probability mass across ordered categories. Tabri calls the associated transport plans minimal-mobility configurations and then applies partial identification to get bounds on both the distance and those configurations when only repeated cross-sections with nonresponse are available. The framework also includes standard resampling inference and a sensitivity analysis for missingness, illustrated on Arab Barometer data. That combination of OT representation plus tailored partial ID looks new relative to standard mobility or distributional papers in this area. It is useful because it turns an otherwise unidentified joint into something interpretable and bounded without strong assumptions on the missingness process. The approach is transparent and the application shows it can be implemented on real survey data. The main soft spot is whether the bounds on the configurations are actually sharp. The abstract claims sharp bounds on the marginals deliver sharp bounds on the discrepancy and the configurations, but the stress-test note is right to flag that this requires the feasible set of (F,G) pairs to behave like a product set or that the paper explicitly characterizes the joint identification region and recomputes the minimal plans inside it. If missingness induces dependence between periods, separate marginal extremes may not attain the true extremal configurations. Without the derivations it is hard to confirm this step is tight rather than conservative. This is a methods paper aimed at applied economists and social scientists who work with ordinal survey outcomes in repeated cross-sections and need a practical way to handle nonresponse. A reader looking for tools to interpret distributional change under limited information will find it directly usable. It deserves a serious referee because the framing is coherent, the partial-ID setup is standard but freshly applied, and the empirical illustration is there; any issues with the joint bounds can be fixed in revision.

Referee Report

1 major / 2 minor

Summary. The paper develops a framework for measuring and interpreting distributional change in ordinal outcomes from repeated cross-sections with missing observations. It shows that the L1 distance between CDFs equals the minimal reallocation of probability mass under the natural ordering (via optimal transport), yielding both a scalar discrepancy measure and a characterization of 'minimal-mobility configurations'. A partial-identification strategy then supplies sharp bounds on the marginal distributions and, in turn, on the measure and configurations; the framework supports resampling-based inference and sensitivity analysis to nonresponse, illustrated with Arab Barometer data.

Significance. If the sharpness claims hold, the work offers a transparent, interpretable way to bound and decompose ordinal distributional change when only marginals are partially identified. It usefully combines a standard OT representation (W1 distance on ordered support) with partial-ID techniques to address a common data limitation in survey research. The application demonstrates feasibility, and the emphasis on configurations provides more than a scalar summary. Credit is due for the clean link between the transport representation and the mobility interpretation.

major comments (1)

[Abstract] Abstract: the statement that partial identification 'delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations' requires explicit justification that the joint identification region for the pair of marginals (F,G) is rectangular or that the extremal minimal-mobility plans are attained at the vertices of the separate marginal regions. If the missingness mechanism creates dependence across periods, the feasible set of transport plans inside the joint region may be strictly smaller than the product of the marginal bounds; the manuscript must characterize this joint region and recompute the configurations within it to confirm sharpness.

minor comments (2)

The notation for minimal-mobility configurations and the associated transport plans should be introduced with a small numerical example early in the text to aid readability.
Clarify whether the resampling procedure for inference accounts for the partial-identification step or treats the bounds as fixed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and positive report. The single major comment raises an important technical point about the joint identification region, which we address directly below. We will revise the manuscript to incorporate the requested clarification.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that partial identification 'delivers sharp bounds on the marginal distributions and, in turn, on both the discrepancy measure and its associated configurations' requires explicit justification that the joint identification region for the pair of marginals (F,G) is rectangular or that the extremal minimal-mobility plans are attained at the vertices of the separate marginal regions. If the missingness mechanism creates dependence across periods, the feasible set of transport plans inside the joint region may be strictly smaller than the product of the marginal bounds; the manuscript must characterize this joint region and recompute the configurations within it to confirm sharpness.

Authors: We appreciate the referee's careful attention to the sharpness claim. The framework is developed for repeated cross-sectional surveys, where the two periods are sampled independently. The partial-identification analysis therefore treats the missingness process in each period separately, with no cross-period linkage in the sampling design or in the maintained assumptions on the nonresponse mechanism. As a result, the identification regions for the two marginal distributions F and G are independent, so that the joint identification region for the pair (F, G) is exactly the Cartesian product of the two marginal regions. Because the L1 discrepancy is a continuous functional of the pair of CDFs alone, its sharp bounds are obtained by optimizing over this product set. The associated minimal-mobility configurations are the optimal transport plans for the extremal pairs (F, G) lying on the boundary of the product region; these extrema are therefore attained at combinations of the vertices of the separate marginal identification sets. We will add an explicit paragraph to the abstract and to Section 3 (Partial Identification) stating the independence of the two cross-sections and confirming that the joint region is rectangular under the maintained sampling assumptions. A short remark will also be added noting that the rectangularity would not hold in a genuine panel with attrition, but that case lies outside the repeated-cross-section setting of the paper. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation builds on external OT representation and partial-ID bounds without self-referential reduction

full rationale

The paper's core step equates the L1 distance on CDFs to the minimal reallocation distance via the standard optimal-transport representation for ordered support (a known result, not derived internally). Minimal-mobility configurations are then defined directly from this representation. Partial identification is invoked to bound the marginal distributions, with the discrepancy measure and configurations bounded 'in turn' as a consequence. No equation reduces the target quantities to fitted parameters by construction, no load-bearing premise rests on a self-citation chain, and no ansatz is smuggled via prior work by the same author. The framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on the abstract, which provides no explicit free parameters, invented entities, or detailed axioms beyond standard domain assumptions in the field.

axioms (1)

domain assumption Categories are ordered and the ordering is known and fixed.
Required for the cumulative distribution function and L1 distance to admit an optimal transport interpretation as minimal reallocation.

pith-pipeline@v0.9.0 · 5443 in / 1279 out tokens · 65438 ms · 2026-05-10T13:55:20.767272+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Bickel, P . J. and D. A. Freedman (1981, 11). Some asymptotic t heory for the bootstrap. Ann. Statist. 9(6), 1196–1217

1981
[2]

Hong, and E

Chernozhukov, V ., H. Hong, and E. Tamer (2007). Estimation and conﬁdence regions for parameter sets in econometric models. Econometrica 75(5), 1243–1284

2007
[3]

Daljord, O. y., G. Pouliot, J. Xiao, and M. Hu (2026). The blac k market for beijing license plates. Working paper, Rice University. D’Haultfoeuille, X., C. Gaillac, and A. Maurel (2024). Line ar regressions with combined data

2026
[4]

Galichon, and Y

Dupuy, A., A. Galichon, and Y . Sun (2019, 12). Estimating mat ching afﬁnity matrices under low-rank constraints. Information and Inference: A Journal of the IMA 8 (4), 677–689

2019
[5]

Makdissi, W

Fakih, A., P . Makdissi, W. Marrouch, R. V . Tabri, and M. Y azbeck (2022). A stochastic dominance test under survey nonresponse with an application to compar ing trust levels in lebanese public institutions. Journal of Econometrics 228 (2), 342–358. Fréchet, M. (1935). Généralisations du théorème des probab ilités totales. Fundamenta Mathemat- icae 25, 3...

2022
[6]

Galichon, A. (2016). Optimal Transport Methods in Economics . Princeton University Press

2016
[7]

Galichon, A. and M. Henry (2011, 04). Set Identiﬁcation in Mo dels with Multiple Equilibria. The Review of Economic Studies 78 (4), 1264–1298

2011
[8]

Galichon, A. and M. Henry (2026). An econometrician’s guide to optimal transport. Technical Report arXiv:2604.04227, arXiv. Working paper

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

Galichon, A. and B. Salanié (2022). Cupid’s invisible hand: Social surplus and identiﬁcation in matching models. The Review of Economic Studies 89 (5), 2600–2629

2022
[10]

Horowitz, J. L. and C. F. Manski (1995). Identiﬁcation and ro bustness with contaminated and corrupted data. Econometrica 63(2), 281–302

1995
[11]

Horowitz, J. L. and C. F. Manski (2000). Nonparametric analy sis of randomized experiments with missing covariate and outcome data. Journal of the American Statistical Association 95 (449), 77–84. 25

2000
[12]

Jenkins, S. P . (2020, March). Comparing distributions of or dinal data. IZA Discussion Paper No.13057

2020
[13]

Manski, C. F. (2005). Partial identiﬁcation with missing da ta: concepts and ﬁndings. International Journal of Approximate Reasoning 39 (2-3), 151–165

2005
[14]

Molinari, F. (2020). Microeconometrics with partial ident iﬁcation. In S. N. Durlauf, L. P . Hansen, J. J. Heckman, and R. L. Matzkin (Eds.), Handbook of Econometrics, V olume 7A, pp. 355–486. Elsevier

2020
[15]

Schennach, S. M. and V . Starck (2026). Optimally transporte d generalized method of moments. Econometrica. Forthcoming

2026
[16]

Shorrocks, A. F. (1978). The measurement of mobility. Econometrica 46(5), 1013–1024

1978
[17]

Sunada, K. and K. Izumi (2025). Optimal treatment assignmen t rules under capacity constraints. V allender, S. S. (1974). Calculation of the Wasserstein dis tance between probability distributions on the line. Theory of Probability & Its Applications 18 (4), 784–786

2025
[18]

Villani, C. (2009). Optimal Transport: Old and New . Grundlehren der mathematischen Wis- senschaften. Springer Berlin Heidelberg. A Proofs of Results A.1 Proposition 1 Proof. The result follows from the one-dimensional characterizat ion of the Wasserstein-1 distance as the L1 distance between cumulative distribution functions descr ibed in V allender (197...

2009