pith. sign in

arxiv: 2605.21066 · v1 · pith:HQCIS4YCnew · submitted 2026-05-20 · 💻 cs.LG

Robust Personalized Recommendation under Hidden Confounding in MNAR

Pith reviewed 2026-05-21 05:31 UTC · model grok-4.3

classification 💻 cs.LG
keywords recommender systemshidden confoundingMNARsensitivity boundsdeconfoundingpersonalized boundsadversarial optimizationobservational data
0
0 comments X

The pith

Estimating user-item level sensitivity bounds relaxes the uniform assumption in deconfounding recommender systems with hidden confounders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recommender systems trained on observational interaction data suffer from selection bias when hidden factors influence which items users choose to engage with. Existing fixes either demand costly randomized trials or apply one global sensitivity bound to every user-item pair, assuming the hidden confounder affects all interactions the same way. This paper develops a method to estimate a separate sensitivity bound for each user-item pair directly from the data. An adversarial training procedure keeps the bounds tight enough to remove bias while preserving the model's ability to predict future interactions accurately. On three real datasets the personalized approach yields better performance than global-bound methods when hidden confounding is present.

Core claim

The paper claims that a framework called Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID) can recover accurate user-item interaction probabilities by learning individualized sensitivity bounds on the effect of unobserved confounders, thereby relaxing the homogeneity assumption required by global sensitivity analysis; a benchmark-guided variant (BPUID) further stabilizes training by anchoring to pre-trained models, and both versions outperform global methods on real-world data without any randomized controlled trial observations.

What carries the argument

Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID), a framework that estimates a distinct sensitivity bound for each user-item pair on the influence of hidden confounders on interaction propensities through adversarial optimization.

If this is right

  • Recommender models can achieve higher predictive accuracy under hidden confounding by using interaction-specific rather than uniform sensitivity bounds.
  • The homogeneity assumption of global sensitivity analysis is no longer required for practical deconfounding in missing-not-at-random settings.
  • Adversarial optimization combined with optional benchmark guidance balances robustness against hidden confounders with maintained recommendation quality.
  • Performance improvements hold across multiple real-world datasets without any need for randomized controlled trial data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same idea of learning interaction-specific bounds could be tested in other domains where confounding strength varies, such as personalized treatment effect estimation.
  • One could examine whether the estimated bounds remain stable when the underlying recommendation model is changed from matrix factorization to modern neural architectures.
  • Direct validation against small-scale randomized trials on the same users and items would test whether the data-driven bounds recover the effects observed in the randomized setting.

Load-bearing premise

User-item level sensitivity bounds can be reliably estimated from observational data alone via the proposed adversarial optimization strategy without introducing new biases or requiring external validation.

What would settle it

A controlled simulation in which the true magnitude of hidden confounding varies across user-item pairs according to a known generative process; if the method's estimated bounds fail to contain the true confounding effects or produce worse predictions than global bounds, the central claim is falsified.

read the original abstract

Recommender systems often rely on observational user--item interaction data, which is prone to selection bias due to users' selective interactions with items. Inverse propensity weighting and doubly robust estimators effectively mitigate selection bias under observed confounding, but are unreliable in the presence of hidden confounders. Existing approaches relying on randomized controlled trials (RCTs) or global sensitivity bounds are constrained in practice: RCTs demand costly experimental data, while global sensitivity bounds presume a uniformly bounded effect of unmeasured confounders on propensities through sensitivity analysis, thereby neglecting heterogeneity across user--item interactions. To overcome this limitation, we propose a novel framework, which estimates user--item level sensitivity bounds, thereby substantially relaxing the homogeneity assumption inherent in global sensitivity bounds named Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID). To ensure both robustness and predictive accuracy, we further develop an adversarial optimization strategy and propose a benchmark-guided variant (BPUID) that incorporates pre-trained models as stabilizing references. Extensive experiments on three real-world datasets demonstrate that our approach significantly outperforms global methods under hidden confounding, without requiring RCT data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID) framework to address hidden confounding in MNAR recommender systems. It estimates user-item level sensitivity bounds via adversarial optimization, relaxing the homogeneity assumption of global sensitivity bounds, and introduces a benchmark-guided variant (BPUID) that incorporates pre-trained models. The authors report that experiments on three real-world datasets show significant outperformance over global methods without requiring RCT data.

Significance. If the personalized bounds can be shown to be identifiable and non-circular, the framework would meaningfully advance robust recommendation by enabling heterogeneous sensitivity analysis without RCTs or uniform bounds, potentially improving practical deployment in observational settings with hidden confounders.

major comments (2)
  1. [§3] §3 (Adversarial Optimization for Personalized Bounds): The claim that user-item sensitivity bounds are recoverable from observational MNAR data alone via the min-max game is load-bearing but unsupported. Sensitivity parameters remain fundamentally unidentifiable under hidden confounding; the adversarial objective can be satisfied by arbitrary feasible intervals without anchoring to the true (unknown) confounding strength, directly weakening the assertion that personalized bounds reliably relax global homogeneity.
  2. [§5] §5 (Experiments): The reported outperformance on three datasets lacks any detail on bound estimation procedure, concrete form of the adversarial strategy, or statistical significance testing. Without these, it is impossible to verify whether the empirical gains substantiate the robustness claims or merely reflect optimization artifacts.
minor comments (2)
  1. [§3] The manuscript would benefit from an explicit statement of the precise optimization objective (e.g., the loss and constraint forms) in the main text rather than deferring all details to the appendix.
  2. [§2] Notation for the sensitivity bounds (upper/lower per user-item pair) should be introduced consistently before the first use in the method description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We provide point-by-point responses to the major comments and outline the revisions we plan to make to improve the clarity and rigor of the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Adversarial Optimization for Personalized Bounds): The claim that user-item sensitivity bounds are recoverable from observational MNAR data alone via the min-max game is load-bearing but unsupported. Sensitivity parameters remain fundamentally unidentifiable under hidden confounding; the adversarial objective can be satisfied by arbitrary feasible intervals without anchoring to the true (unknown) confounding strength, directly weakening the assertion that personalized bounds reliably relax global homogeneity.

    Authors: We concur that sensitivity parameters cannot be uniquely identified from observational MNAR data due to the presence of hidden confounding. Our framework does not purport to recover the ground-truth confounding strengths but rather employs an adversarial min-max optimization to compute personalized sensitivity bounds that are consistent with the observed data while allowing for heterogeneity across user-item pairs. This approach provides a practical relaxation of the global sensitivity bound assumption by deriving data-dependent intervals that ensure robustness. We will revise the manuscript in §3 to explicitly discuss the identifiability challenges and clarify that the bounds serve as conservative, feasible ranges for sensitivity analysis rather than precise estimates of the true effects. Additionally, we will provide more formal justification for the adversarial game's role in bounding the confounding impact. revision: yes

  2. Referee: [§5] §5 (Experiments): The reported outperformance on three datasets lacks any detail on bound estimation procedure, concrete form of the adversarial strategy, or statistical significance testing. Without these, it is impossible to verify whether the empirical gains substantiate the robustness claims or merely reflect optimization artifacts.

    Authors: We appreciate this observation and agree that additional details are necessary for reproducibility and verification. In the revised version, we will augment §5 with a comprehensive description of the bound estimation procedure, including the specific implementation of the adversarial optimization strategy (e.g., the loss functions and training dynamics). We will also report the results of statistical significance tests to confirm that the performance improvements are statistically meaningful and not due to random optimization variations. These additions will strengthen the empirical validation of our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PUID derivation chain

full rationale

The paper proposes estimating user-item sensitivity bounds from observational MNAR data via an adversarial optimization strategy within the PUID framework, then applies them for deconfounding. No load-bearing step reduces by construction to a self-definition, a fitted parameter renamed as a prediction, or a self-citation chain. The BPUID variant references pre-trained models as stabilizers, but this is an external reference rather than an internal tautology. The central claim rests on the proposed optimization and empirical outperformance on three datasets, which supplies independent content outside the inputs. No equations or sections exhibit the specific reductions required for circularity flags.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach depends on the estimability of these personalized bounds and the effectiveness of the adversarial optimization strategy, which are introduced in the paper.

free parameters (1)
  • personalized sensitivity bounds
    These are estimated per user-item interaction, serving as key parameters in the deconfounding process.
axioms (1)
  • domain assumption The effect of hidden confounders on propensities varies across different user-item pairs
    This heterogeneity assumption allows relaxing the global bound.
invented entities (1)
  • PUID no independent evidence
    purpose: Framework for personalized deconfounding in recommendations
    Newly proposed method without mentioned external evidence for the bounds estimation.

pith-pipeline@v0.9.0 · 5715 in / 1363 out tokens · 44859 ms · 2026-05-21T05:31:03.004092+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Bias and debiasing in recommender systems: A survey and future directions,

    J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He, “Bias and debiasing in recommender systems: A survey and future directions,”ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–39, 2023

  2. [2]

    Collaborative filtering and the missing at random assumption,

    B. M. Marlin, R. S. Zemel, S. T. Roweis, and M. Slaney, “Collaborative filtering and the missing at random assumption,” inUAI, 2007

  3. [3]

    Ranking with non- random missing ratings: Influence of popularity and positivity on evaluation metrics,

    B. Pradel, N. Usunier, and P. Gallinari, “Ranking with non- random missing ratings: Influence of popularity and positivity on evaluation metrics,” inRecSys, 2012

  4. [4]

    R. J. A. Little and D. B. Rubin,Statistical Analysis with Missing Data, 3rd ed. Wiley, 2019

  5. [5]

    Model- agnostic counterfactual reasoning for eliminating popularity bias in recommender systems,

    T. Wei, F. Feng, J. Chen, Z. Wu, J. Yi, and X. He, “Model- agnostic counterfactual reasoning for eliminating popularity bias in recommender systems,” inKDD, 2021

  6. [6]

    Modeling dynamic missingness of implicit feedback for recommendation,

    M. Wang, M. Gong, X. Zheng, and K. Zhang, “Modeling dynamic missingness of implicit feedback for recommendation,” inNeurIPS, 2018

  7. [7]

    Training and testing low-degree polynomial data mappings via linear svm,

    Y .-W. Chang, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin, “Training and testing low-degree polynomial data mappings via linear svm,”Journal of Machine Learning Research, vol. 11, pp. 1471– 1490, 2010

  8. [8]

    Probabilistic matrix factorization with non-random missing data,

    J. M. Hern ´andez-Lobato, N. Houlsby, and Z. Ghahramani, “Probabilistic matrix factorization with non-random missing data,” inICML, 2014

  9. [9]

    Training and testing of recommender systems on data missing not at random,

    H. Steck, “Training and testing of recommender systems on data missing not at random,” inKDD, 2010

  10. [10]

    G. W. Imbens and D. B. Rubin,Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, 2015

  11. [11]

    Unbiased recommen- dation model based on improved propensity score estimation,

    J. Luo, D. Liu, W. Pan, and Z. Ming, “Unbiased recommen- dation model based on improved propensity score estimation,” Journal of Computer Applications, vol. 42, no. 8, pp. 3508– 3515, 2021

  12. [12]

    Doubly robust estimator for ranking metrics with post- click conversions,

    Y . Saito, “Doubly robust estimator for ranking metrics with post- click conversions,” inRecSys, 2020

  13. [13]

    Recommendations as treatments: Debiasing learn- ing and evaluation,

    T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims, “Recommendations as treatments: Debiasing learn- ing and evaluation,” inICML, 2016

  14. [14]

    Counterfactuals and causal inference: Methods and principles for social research,

    S. L. Morgan and C. Winship, “Counterfactuals and causal inference: Methods and principles for social research,”Social F orces, vol. 88, no. 1, pp. 466–467, 2009

  15. [15]

    Doubly robust joint learning for recommendation on data missing not at random,

    X. Wang, R. Zhang, Y . Sun, and J. Qi, “Doubly robust joint learning for recommendation on data missing not at random,” inICML, 2019

  16. [16]

    Addressing unmeasured confounder for recommendation with sensitivity analysis,

    S. Ding, P. Wu, F. Feng, Y . Wang, X. He, Y . Liao, and Y . Zhang, “Addressing unmeasured confounder for recommendation with sensitivity analysis,” inKDD, 2022

  17. [17]

    Removing hidden confounding in recom- mendation: A unified multi-task learning approach,

    H. Li, K. Wu, C. Zheng, Y . Xiao, H. Wang, Z. Geng, F. Feng, X. He, and P. Wu, “Removing hidden confounding in recom- mendation: A unified multi-task learning approach,”NeurIPS, 2023

  18. [18]

    Balancing unobserved confounding with a few unbiased ratings in debiased recom- mendations,

    H. Li, Y . Xiao, C. Zheng, and P. Wu, “Balancing unobserved confounding with a few unbiased ratings in debiased recom- mendations,” inWWW, 2023

  19. [19]

    Addressing correlated latent exogenous variables in debiased recommender systems,

    S. Zhang, Y . Zhang, J. Chen, and H. Sui, “Addressing correlated latent exogenous variables in debiased recommender systems,” inKDD, 2025

  20. [20]

    CBPL: A unified calibration and balanc- ing propensity learning framework in causal recommendation for debiasing,

    S. Zhang and T. Xia, “CBPL: A unified calibration and balanc- ing propensity learning framework in causal recommendation for debiasing,” inIJCAI Workshop, 2025

  21. [21]

    Adaptive structure learning with partial parameter sharing for post-click conversion rate prediction,

    C. Zheng, H. Pan, Y . Zhang, and H. Li, “Adaptive structure learning with partial parameter sharing for post-click conversion rate prediction,” inSIGIR, 2025

  22. [22]

    Unified min- imax optimization framework for propensity score estimation in debiased recommendation,

    C. Zheng, H. Yang, J. Chen, S. Zhang, and T. Xia, “Unified min- imax optimization framework for propensity score estimation in debiased recommendation,” inAAAI, 2026

  23. [23]

    Addressing hidden confounding with heterogeneous observational datasets for rec- ommendation,

    Y . Xiao, H. Li, Y . Tang, and W. Zhang, “Addressing hidden confounding with heterogeneous observational datasets for rec- ommendation,” inNeurIPS, 2024

  24. [24]

    Unveiling extraneous sampling bias with data missing-not-at-random,

    C. Zheng, H. Yang, H. Li, and M. Yang, “Unveiling extraneous sampling bias with data missing-not-at-random,” inNeurIPS, 2025

  25. [25]

    Confounder balancing in adversarial domain adaptation for pre- trained large models fine-tuning,

    S. Jiang, Q. Chen, Y . Xiang, Y . Pan, X. Wu, and Y . Lin, “Confounder balancing in adversarial domain adaptation for pre- trained large models fine-tuning,”Neural Networks, vol. 173, p. 106173, 2024

  26. [26]

    Learning causal effects on hypergraphs,

    J. Ma, M. Wan, L. Yang, J. Li, B. Hecht, and J. Teevan, “Learning causal effects on hypergraphs,” inKDD, 2022

  27. [27]

    Person- alized behavior-aware transformer for multi-behavior sequential recommendation,

    J. Su, C. Chen, Z. Lin, X. Li, W. Liu, and X. Zheng, “Person- alized behavior-aware transformer for multi-behavior sequential recommendation,” inACM MM, 2023

  28. [28]

    Ddghm: Dual dynamic graph with hybrid metric training for cross-domain sequential recommendation,

    X. Zheng, J. Su, W. Liu, and C. Chen, “Ddghm: Dual dynamic graph with hybrid metric training for cross-domain sequential recommendation,” inACM MM, 2022

  29. [29]

    How can recommender systems benefit from large language models: A survey,

    J. Lin, X. Dai, Y . Xi, W. Liu, B. Chen, H. Zhang, Y . Liu, C. Wu, X. Li, C. Zhuet al., “How can recommender systems benefit from large language models: A survey,”ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–47, 2025

  30. [30]

    Large language models make sample-efficient recommender systems,

    J. Lin, X. Dai, R. Shan, B. Chen, R. Tang, Y . Yu, and W. Zhang, “Large language models make sample-efficient recommender systems,”Frontiers of Computer Science, vol. 19, no. 4, p. 194328, 2025

  31. [31]

    Combating selection biases in recommender systems with a few unbiased ratings,

    X. Wang, R. Zhang, Y . Sun, and J. Qi, “Combating selection biases in recommender systems with a few unbiased ratings,” inWSDM, 2021

  32. [32]

    Learning causal networks with latent variables from multivari- ate information in genomic data,

    L. Verny, N. Sella, S. Affeldt, P. Singh, and H. Isambert, “Learning causal networks with latent variables from multivari- ate information in genomic data,”PLoS Computational Biology, vol. 13, no. 11, p. e1005662, 2017

  33. [33]

    Matrix factorization tech- niques for recommender systems,

    Y . Koren, R. Bell, and C. V olinsky, “Matrix factorization tech- niques for recommender systems,”Computer, vol. 42, no. 8, pp. 30–37, 2009

  34. [34]

    Large-scale causal approaches to debiasing post- click conversion rate estimation with multi-task learning,

    W. Zhang, W. Bao, X.-Y . Liu, K. Yang, Q. Lin, H. Wen, and R. Ramezani, “Large-scale causal approaches to debiasing post- click conversion rate estimation with multi-task learning,” in WWW, 2020