Moment-Based Inference for Regression with Latent Dirichlet Covariates

Ziyu Jiang

arxiv: 2605.30718 · v1 · pith:OPYT3GUAnew · submitted 2026-05-29 · 💰 econ.EM · stat.ME· stat.ML

Moment-Based Inference for Regression with Latent Dirichlet Covariates

Ziyu Jiang This is my paper

Pith reviewed 2026-06-28 20:15 UTC · model grok-4.3

classification 💰 econ.EM stat.MEstat.ML

keywords latent Dirichlet allocationmoment-based inferenceregression with latent covariatesspectral methodstopic modelsasymptotic linearityDirichlet concentration

0 comments

The pith

Response-weighted word moments identify the regression coefficient β directly without estimating document topic shares.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that in regressions where the covariates are latent topic shares from a finite LDA model, one can recover the coefficient β from corrected response-weighted word moments. This avoids both the inconsistency of estimating topic shares from short documents and the need to propagate first-stage estimation error through a two-step procedure. The correction requires knowing the total Dirichlet concentration α0, which the authors identify from the fact that a family of corrected operators commute exactly at the true α0 when there are three or more topics and a generic finite-probe condition holds. The resulting estimator is asymptotically linear in the number of documents at fixed document length, with sandwich standard errors constructed from document-level moment contributions.

Core claim

Under a finite LDA model with response residuals orthogonal to the low-order token moments used for identification, response-weighted word moments admit the same correction, and the resulting supervised operator identifies the regression coefficient β directly, without estimating document-level topic shares. The main obstacle is that the correction depends on the unknown total concentration α0. We show that, for k≥3 topics and under a generic finite-probe condition, α0 is identified by commutativity: at the true value a family of corrected word-moment operators commute, whereas away from it they generically do not. This yields a feasible estimator and lets uncertainty in α̂0 propagate into i

What carries the argument

The supervised operator formed from response-weighted word moments after correction for total Dirichlet concentration α0, where α0 itself is recovered by commutativity of a family of such corrected operators.

If this is right

The estimator for β requires no consistent recovery of per-document topic proportions.
Uncertainty in the estimated α0 enters the asymptotic variance of β̂ through the sandwich formula.
The estimator remains asymptotically linear and admits valid inference when the number of documents grows with document length held fixed.
Simulations produce near-nominal coverage for β where plug-in topic-share regressions undercover.
The same commutativity argument supplies a feasible estimator for contrast inference on latent topic effects in applications such as journal text data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The commutativity device for recovering α0 may extend to other finite-mixture regressions whose moments admit analogous corrections.
If the orthogonality condition is approximately satisfied in practice, the method could reduce the computational burden of topic-regression pipelines in text-as-data settings.
The approach suggests testing whether similar moment corrections can bypass individual-level latent-variable estimation in other supervised mixture models.

Load-bearing premise

Response residuals are orthogonal to the low-order token moments used for identification.

What would settle it

In data generated from the finite LDA model with known true β and α0, the estimator constructed from the commutativity-identified α0 converges to a value different from the true β as the number of documents grows large while document length stays fixed.

read the original abstract

Topic models are often used as dimension-reduction tools before regression, with estimated document-level topic shares treated as observed covariates. This plug-in workflow creates two inferential difficulties: valid inference requires a regular first-stage-to-second-stage expansion that propagates topic-estimation uncertainty, and, at fixed document length, a document's topic mixture cannot be consistently recovered from its own words even when the population topic matrix is known. Corrected spectral moment methods for latent Dirichlet allocation (LDA) offer a starting point: when the total Dirichlet concentration is known, low-order word moments can be corrected to yield operators diagonal in the latent topic basis. We extend this to downstream regression. Under a finite LDA model with response residuals orthogonal to the low-order token moments used for identification, response-weighted word moments admit the same correction, and the resulting supervised operator identifies the regression coefficient $\beta$ directly, without estimating document-level topic shares. The main obstacle is that the correction depends on the unknown total concentration $\alpha_0$. We show that, for $k\ge3$ topics and under a generic finite-probe condition, $\alpha_0$ is identified by commutativity: at the true value a family of corrected word-moment operators commute, whereas away from it they generically do not. This yields a feasible estimator and lets uncertainty in $\hat\alpha_0$ propagate into inference for $\beta$. The estimator is asymptotically linear as the number of documents grows with fixed document length, with sandwich standard errors from document-level moment contributions. Simulations show near-nominal coverage where plug-in topic-share regressions can undercover, and an application to top economics journals illustrates contrast inference for latent topic effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends corrected spectral moments from unsupervised LDA to regression on latent topics by weighting moments with the outcome and identifying the concentration via commutativity under an orthogonality condition.

read the letter

This gives a direct estimator for beta in regressions where the covariates are LDA topic shares, without first recovering per-document mixtures. The response-weighted moments get the same correction as the unsupervised case, and commutativity of the resulting operators pins down alpha0 for k at least 3 under a generic finite-probe condition. That yields an asymptotically linear estimator with sandwich standard errors that account for the estimated alpha0, all at fixed document length.

The approach is new in moving the spectral correction into the supervised setting and in using commutativity rather than external data or extra assumptions to recover the concentration. The simulations and journal application are useful for showing where the plug-in approach undercover and where the new estimator gives contrast inference on topic effects.

The central maintained condition is that response residuals are orthogonal to the low-order token moments; if that fails the corrected operator picks up an extra term and no longer isolates beta. The abstract states it plainly but does not derive when the condition is likely to hold given the latent topic structure, so that remains the main practical question. The finite-probe condition also needs the full derivations to judge how generic it really is.

This is aimed at applied people in economics and political science who already run topic regressions on text and want valid inference without consistent document-level recovery. It is worth sending to referees because the target problem is common, the technical move is clean, and the orthogonality issue is stated up front rather than hidden.

Referee Report

2 major / 2 minor

Summary. The paper extends corrected spectral moment methods for LDA to downstream linear regression. Under a finite LDA model and the assumption that response residuals are orthogonal to the low-order token moments, response-weighted word moments are corrected using the same operators as the unsupervised case; the resulting supervised operator identifies the regression coefficient β directly without recovering document-level topic shares. The unknown total concentration α0 is identified by a commutativity condition on a family of corrected operators that holds at the true value for k≥3 topics under a generic finite-probe condition. The resulting estimator is asymptotically linear in the number of documents (fixed document length) and admits sandwich standard errors; simulations and an economics-journal application are reported.

Significance. If the orthogonality condition and commutativity identification hold, the approach supplies a direct, consistent estimator for β that bypasses both the first-stage estimation error and the per-document inconsistency that plague plug-in topic-share regressions. The asymptotic linearity result and explicit propagation of uncertainty in α̂0 into inference for β are concrete strengths.

major comments (2)

[Abstract / identification argument] Abstract (extension paragraph) and the identification argument: the orthogonality of response residuals to the low-order token moments is stated as a maintained condition required for the response-weighted moments to identify β after correction, yet the manuscript supplies no argument establishing when this orthogonality is satisfied by the latent topic structure or how it relates to the regression of interest. Because this assumption is load-bearing for the central claim that the supervised operator isolates β, its justification or testable implications should be addressed explicitly.
[Identification of α0 via commutativity] The finite-probe condition used to guarantee that commutativity identifies α0 at the true value (and only there) for k≥3 is described as 'generic' but its precise statement, the measure of the set on which it fails, and the explicit verification that the family of corrected operators indeed fails to commute away from α0 are not visible in the provided derivations. Without these details the identification step for α0 remains formally incomplete.

minor comments (2)

Notation for the corrected operators and the response-weighted moments should be introduced with explicit equation numbers in the main text rather than only in the abstract.
The simulation design (document length, number of topics, strength of the orthogonality violation) should be reported in a table so that coverage results can be compared directly to the plug-in benchmark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the identification arguments. We address each point below and will revise the manuscript accordingly to strengthen the presentation of the assumptions and identification results.

read point-by-point responses

Referee: [Abstract / identification argument] Abstract (extension paragraph) and the identification argument: the orthogonality of response residuals to the low-order token moments is stated as a maintained condition required for the response-weighted moments to identify β after correction, yet the manuscript supplies no argument establishing when this orthogonality is satisfied by the latent topic structure or how it relates to the regression of interest. Because this assumption is load-bearing for the central claim that the supervised operator isolates β, its justification or testable implications should be addressed explicitly.

Authors: We agree that the orthogonality condition merits explicit discussion. In the revision we will add a new subsection (likely in Section 3) that states sufficient conditions under which the assumption holds, including the case in which the regression error is independent of the token sequence conditional on the latent topic proportions, and the weaker case in which the error is uncorrelated with the low-order moments of the marginal token distribution. We will also note a testable implication: after obtaining the estimator, one can regress the fitted residuals on the observed word frequencies and check whether the coefficients are statistically indistinguishable from zero. These additions clarify the scope of the assumption without changing the formal results. revision: yes
Referee: [Identification of α0 via commutativity] The finite-probe condition used to guarantee that commutativity identifies α0 at the true value (and only there) for k≥3 is described as 'generic' but its precise statement, the measure of the set on which it fails, and the explicit verification that the family of corrected operators indeed fails to commute away from α0 are not visible in the provided derivations. Without these details the identification step for α0 remains formally incomplete.

Authors: We accept that the finite-probe condition and the associated non-commutativity argument require a more explicit statement. In the revised version we will (i) give the precise definition of the finite-probe condition (a rank condition on a collection of probe vectors that ensures the commutator map is injective away from the true α0), (ii) state that the set of topic matrices and probe vectors for which commutativity holds at an incorrect α0 has Lebesgue measure zero in the relevant parameter space, and (iii) move the algebraic verification that the corrected operators commute if and only if α0 equals the true value into a dedicated appendix lemma. These changes make the identification argument self-contained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; identification uses external assumptions and commutativity condition

full rationale

The derivation extends unsupervised corrected spectral moments to the supervised regression setting under the maintained orthogonality assumption on response residuals, then identifies the unknown α0 via the commutativity of corrected operators at the true value (for k≥3 under generic finite-probe). These steps are presented as identifying restrictions rather than quantities that reduce by the paper's own equations to fitted inputs or prior self-citations. No self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation patterns appear in the abstract or described chain; the central operator for β is obtained after applying the correction that holds only under the stated orthogonality, which is not derived from the target parameter itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the finite LDA generative model, the orthogonality of residuals to low-order moments, and a generic finite-probe condition that makes the commutativity map injective. No free parameters are introduced because α0 is identified; no new entities are postulated.

axioms (2)

domain assumption Finite LDA model with response residuals orthogonal to the low-order token moments used for identification
Invoked to ensure that response-weighted moments admit the same correction as the unsupervised case and identify β.
domain assumption For k≥3 topics, the generic finite-probe condition makes the commutativity map injective at the true α0
Required for point identification of α0 from the family of corrected operators.

pith-pipeline@v0.9.1-grok · 5828 in / 1664 out tokens · 23400 ms · 2026-06-28T20:15:48.145826+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Jushan Bai

doi: 10.1111/ecin.12292. Jushan Bai. Inferential theory for factor models of large dimensions.Econometrica, 71(1):135–171,

work page doi:10.1111/ecin.12292
[2]

Jushan Bai and Serena Ng

doi: 10.1111/1468-0262.00392. Jushan Bai and Serena Ng. Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions.Econometrica, 74(4):1133–1150,

work page doi:10.1111/1468-0262.00392
[3]

2006.00696.x

doi: 10.1111/j.1468-0262. 2006.00696.x. 48 Laura Battaglia, Timothy Christensen, Stephen Hansen, and Szymon Sacher. Inference for regression with variables generated by AI or machine learning,

work page doi:10.1111/j.1468-0262 2006
[4]

Leland Bybee, Bryan Kelly, Asaf Manela, and Dacheng Xiu

arXiv:2402.15585. Leland Bybee, Bryan Kelly, Asaf Manela, and Dacheng Xiu. Business news and business cycles. The Journal of Finance, 79(5):3105–3147,

work page arXiv
[5]

David Card and Stefano DellaVigna

doi: 10.1111/jofi.13377. David Card and Stefano DellaVigna. Nine facts about top journals in economics.Journal of Economic Literature, 51(1):144–161,

work page doi:10.1111/jofi.13377
[6]

Stephen Hansen, Michael McMahon, and Andrea Prat

doi: 10.1257/jel.51.1.144. Stephen Hansen, Michael McMahon, and Andrea Prat. Transparency and deliberation within the FOMC: A computational linguistics approach.The Quarterly Journal of Economics, 133(2): 801–870,

work page doi:10.1257/jel.51.1.144
[7]

Vegard H

doi: 10.1093/qje/qjx045. Vegard H. Larsen and Leif Anders Thorsrud. The value of news for economic developments.Journal of Econometrics, 210(1):203–218,

work page doi:10.1093/qje/qjx045
[8]

Hannes Mueller and Christopher Rauh

doi: 10.1016/j.jeconom.2018.11.013. Hannes Mueller and Christopher Rauh. Reading between the lines: Prediction of political violence using newspaper text.American Political Science Review, 112(2):358–375,

work page doi:10.1016/j.jeconom.2018.11.013 2018
[9]

OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts

Jason Priem, Heather Piwowar, and Richard Orr. OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts.arXiv preprint arXiv:2205.01833,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

doi: 10.1109/ TPAMI.2017.2682085. James H. Stock and Mark W. Watson. Macroeconomic forecasting using diffusion indexes.Journal of Business & Economic Statistics, 20(2):147–162,

work page arXiv 2017
[11]

Leif Anders Thorsrud

doi: 10.1198/073500102317351921. Leif Anders Thorsrud. Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business & Economic Statistics, 38(2):393–409,

work page doi:10.1198/073500102317351921
[12]

doi: 10.1080/07350015.2018. 1506344. Yining Wang and Jun Zhu. Spectral methods for supervised topic models. InAdvances in Neural Information Processing Systems, volume 27,

work page doi:10.1080/07350015.2018 2018

[1] [1]

Jushan Bai

doi: 10.1111/ecin.12292. Jushan Bai. Inferential theory for factor models of large dimensions.Econometrica, 71(1):135–171,

work page doi:10.1111/ecin.12292

[2] [2]

Jushan Bai and Serena Ng

doi: 10.1111/1468-0262.00392. Jushan Bai and Serena Ng. Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions.Econometrica, 74(4):1133–1150,

work page doi:10.1111/1468-0262.00392

[3] [3]

2006.00696.x

doi: 10.1111/j.1468-0262. 2006.00696.x. 48 Laura Battaglia, Timothy Christensen, Stephen Hansen, and Szymon Sacher. Inference for regression with variables generated by AI or machine learning,

work page doi:10.1111/j.1468-0262 2006

[4] [4]

Leland Bybee, Bryan Kelly, Asaf Manela, and Dacheng Xiu

arXiv:2402.15585. Leland Bybee, Bryan Kelly, Asaf Manela, and Dacheng Xiu. Business news and business cycles. The Journal of Finance, 79(5):3105–3147,

work page arXiv

[5] [5]

David Card and Stefano DellaVigna

doi: 10.1111/jofi.13377. David Card and Stefano DellaVigna. Nine facts about top journals in economics.Journal of Economic Literature, 51(1):144–161,

work page doi:10.1111/jofi.13377

[6] [6]

Stephen Hansen, Michael McMahon, and Andrea Prat

doi: 10.1257/jel.51.1.144. Stephen Hansen, Michael McMahon, and Andrea Prat. Transparency and deliberation within the FOMC: A computational linguistics approach.The Quarterly Journal of Economics, 133(2): 801–870,

work page doi:10.1257/jel.51.1.144

[7] [7]

Vegard H

doi: 10.1093/qje/qjx045. Vegard H. Larsen and Leif Anders Thorsrud. The value of news for economic developments.Journal of Econometrics, 210(1):203–218,

work page doi:10.1093/qje/qjx045

[8] [8]

Hannes Mueller and Christopher Rauh

doi: 10.1016/j.jeconom.2018.11.013. Hannes Mueller and Christopher Rauh. Reading between the lines: Prediction of political violence using newspaper text.American Political Science Review, 112(2):358–375,

work page doi:10.1016/j.jeconom.2018.11.013 2018

[9] [9]

OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts

Jason Priem, Heather Piwowar, and Richard Orr. OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts.arXiv preprint arXiv:2205.01833,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

doi: 10.1109/ TPAMI.2017.2682085. James H. Stock and Mark W. Watson. Macroeconomic forecasting using diffusion indexes.Journal of Business & Economic Statistics, 20(2):147–162,

work page arXiv 2017

[11] [11]

Leif Anders Thorsrud

doi: 10.1198/073500102317351921. Leif Anders Thorsrud. Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business & Economic Statistics, 38(2):393–409,

work page doi:10.1198/073500102317351921

[12] [12]

doi: 10.1080/07350015.2018. 1506344. Yining Wang and Jun Zhu. Spectral methods for supervised topic models. InAdvances in Neural Information Processing Systems, volume 27,

work page doi:10.1080/07350015.2018 2018