{"total":20,"items":[{"citing_arxiv_id":"2605.20767","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study","primary_cat":"cs.CL","submitted_at":"2026-05-20T06:09:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15512","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Auto-Conditioned Frank-Wolfe Algorithms","primary_cat":"math.OC","submitted_at":"2026-05-15T01:12:32+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13497","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models","primary_cat":"cs.IR","submitted_at":"2026-05-13T13:20:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"dataset and insights for evaluating recommender systems. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 540-550. [15] F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context.ACM Trans. Interact. Intell. Syst.5, 4, Article 19 (Dec. 2015), 19 pages. doi:10.1145/2827872 [16] Katja Hofmann, Anne Schuth, Alejandro Bellogin, and Maarten De Rijke. 2014. Effects of position bias on click-based recommender evaluation. InEuropean Conference on Information Retrieval. Springer, 624-630. [17] Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley. 2024. Bridging language and items for retrieval and recommendation."},{"citing_arxiv_id":"2605.12995","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking","primary_cat":"cs.LG","submitted_at":"2026-05-13T04:52:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"delimiter tags receive a constant format penalty pfmt < 0 in place of the computed reward; Appendix C.3 specifies how this penalty is applied to the generated tokens in malformed cases. 4 Preprint. Under review. 3.3 Gradient Analysis For comparison, consider a GRPO baseline that defines a single combined reward R(i) joint = R(i) slate +λR (i) rank and computes a joint group-relative advantage ˆA(i) joint =R (i) joint − ¯Rjoint, (8) where ¯Rjoint = 1 G ∑G j=1 R(j) joint is the corresponding per-prompt group mean. This joint advantage is applied uniformly to all tokens in rollout i. At θ=θ old, let T (i) τ and T (i) σ denote the sets of token positions belonging to the slate content c(i) τ and rank content c(i) σ , respectively. The resulting policy gradient is ∇θLjoint =− 1 G"},{"citing_arxiv_id":"2605.07964","ref_index":37,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means","primary_cat":"stat.ML","submitted_at":"2026-05-08T16:27:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A Bayesian predictive model adaptively selects martingale factors to construct asymptotically log-optimal confidence sequences for bounded means while preserving anytime validity under misspecification.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Thus, for each candidate valueµ, the coefficient is computed by maximising 1 n−1 n−1X i=1 log{1 +λ(X i −µ)} overλ∈I µ,c. Equivalently, when the optimum is interior, it solves 1 n−1 n−1X i=1 Xi −µ 1 +λ(X i −µ) = 0.(36) Parametric Bayesian predictive.The parametric Bayesian method uses a beta working model in mean-concentration parametrisation, Xi |ρ, ν∼Beta(ρν,(1−ρ)ν), ρ∈(0,1), ν >0.(37) Posterior inference is approximated using the IBIS/waste-free SMC sampler from the particlespackage (Chopin et al., 2020). WithN smc particles and Markov chains of lengthL smc, each time point stores M=N smc ×L smc posterior samples. Conditional on these samples, the predictive score E \u0014 X−µ 1 +λ(X−µ) X∼Beta(a, b) \u0015 is evaluated for each sampled beta distribution using the closed-form hypergeometric ex-"},{"citing_arxiv_id":"2605.07171","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy","primary_cat":"cs.LG","submitted_at":"2026-05-08T03:07:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"We improve on PE-CS in cases when COF can deem a† infeasible sooner than the BAI-filter gets toak. When γa∗ k dominates, we can see that the gap µa∗ /(1−α)−µ k is a strictly larger gap than∆ k =µ ∗ −µ k making the instance-dependence of COF superior. 4 Experiments To validate COF empirically, we conduct experiments on the Goodreads [29] and Movielens [14] datasets from the recommendation systems literature. Goodreads contains crowd-sourced book ratings over 2.3 million unique books from 870,000 users. The books are organized into eight genres and each genre has between 36,514 and 335,449 books annotated with it. The number of aggregated reviews per genre are between 150,000 and 3.5 million. For MovieLens we use the 25M variant"},{"citing_arxiv_id":"2604.27599","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation","primary_cat":"cs.IR","submitted_at":"2026-04-30T08:49:44+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27223","ref_index":29,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Graphify: Automated Synthesis of Type-Safe Graph Backends via $O(S)$ GraphQL-to-Gremlin Transpilation","primary_cat":"cs.DB","submitted_at":"2026-04-29T21:46:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25291","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space","primary_cat":"cs.IR","submitted_at":"2026-04-28T06:57:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"• RQ7:Is GloRank effective when deployed in a real-world, large- scale industrial production system? 4.1 Dataset and Evaluation Metrics Datasets.To comprehensively evaluate the performance of Glo- Rank, we conduct experiments on three datasets, including two widely used public benchmarks (following previous work [29, 45]), Amazon Books [30] and MovieLens-1M [14], and a large-scale In- dustrial dataset collected from a real-world e-commerce platform. Amazon Books and MovieLens-1M serve as standard testbeds to ensure reproducibility, while the Industrial dataset is employed to verify the effectiveness and robustness of our method in a practical production environment. The detailed statistics of these datasets"},{"citing_arxiv_id":"2604.25032","ref_index":101,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Offline Evaluation Measures of Fairness in Recommender Systems","primary_cat":"cs.IR","submitted_at":"2026-04-27T22:28:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"other measure types by first defining a list of properties or desiderata. However, for recommender system fairness measures, there is no standard 'checklist' to follow. Other prior work from a wide range of fields within and outside computer science have constructed a number of lists detailing properties for various types of evaluation measures: effectiveness [153], diversity [5, 8], fairness in computer network [101], and income inequality in the economics domain [7, 24]. Some general numerical prop- erties, such as whether the measure score is bounded, can be readily adopted as ideal characteristics for recommender system fairness evaluation measures. However, not all properties would serve as suitable criteria. Firstly, some of these properties are not specifically tailored for fairness objectives in recommender systems, which"},{"citing_arxiv_id":"2604.23568","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Green-Red Watermarking for Recommender Systems","primary_cat":"cs.IR","submitted_at":"2026-04-26T07:16:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"black-box models simply by exploiting API query-response patterns [15, 25, 26, 30, 39]. Consequently, developing robust mechanisms to safeguard model ownership has become a critical imperative. Watermarking has emerged as a dominant mechanism for in- tellectual property protection across diverse domains, including multimedia, deep learning models, and LLMs [7, 11, 16, 32]. By em- bedding imperceptible yet verifiable signals, this technique enables rightful owners to assert provenance over models and their outputs [20, 34]. Distinct from alternative protection strategies [18, 23, 36], watermarking provides a proactive safeguard that preserves model utility even following deployment or malicious extraction attacks"},{"citing_arxiv_id":"2604.21536","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation","primary_cat":"cs.IR","submitted_at":"2026-04-23T10:59:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"stable user preferences or exponential weightingHk =Pm t=1 \u0010 exp(γ·t)Pm j=1 exp(γ·j) \u0011 ·h k t for datasets where recent interactions are more predictive. In the exponential 4 N. Severin et al. Table 1: Statistics of the datasets Dataset Domain #Users #Items #Interactions Avg. Length Density Beauty[24] Product Reviews 70,996 39,116 436,309 6.145 0.00304 ML20M[6] Movies 137,165 13,132 19,933,088 143.929 0.01096 Kion[26] Movies 16,797 5,626 287,698 17.128 0.00055 Amazon M2 [10] E-Commerce 616,502 334,060 3,651,542 5.923 0.00002 weighting case, the hyperparameterγcontrols the emphasis on recency; higher values place more weight on recent interactions. Auxilary distillation lossLdistill(u, k) =L mse(Hk(Su), T(E(P(u))))is com-"},{"citing_arxiv_id":"2604.21305","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation","primary_cat":"cs.IR","submitted_at":"2026-04-23T05:44:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"WPGRec is a new sequential recommender that performs multi-scale temporal modeling via stationary wavelet packets and injects high-order collaborative information through scale-aligned graph propagation with energy-aware gated fusion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17259","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"HORIZON: A Benchmark for In-the-wild User Behaviour Modeling","primary_cat":"cs.IR","submitted_at":"2026-04-19T04:45:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08643","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits","primary_cat":"cs.LG","submitted_at":"2026-04-09T17:45:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06838","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach","primary_cat":"cs.AI","submitted_at":"2026-04-08T09:00:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ILASP approximates neural networks for recipe preference learning as both global and local models, using weak constraints and PCA to maintain fidelity and interpretability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16379","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LLMAR: A Tuning-Free Recommendation Framework for Sparse and Text-Rich Industrial Domains","primary_cat":"cs.IR","submitted_at":"2026-03-25T02:49:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMAR applies LLM reasoning with a self-correction reflection loop to generate semantic user motives for tuning-free recommendations, showing up to 54.6% nDCG@10 gains on a sparse industrial dataset over trained baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.14706","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Adaptive Autoguidance for Item-Side Fairness in Diffusion Recommender Systems","primary_cat":"cs.IR","submitted_at":"2026-02-16T12:52:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A2G-DiffRec applies adaptive autoguidance in diffusion recommenders, learning to balance main and weak model outputs via fairness-aware regularization to improve item exposure fairness with only marginal accuracy loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.21938","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection","primary_cat":"cs.LG","submitted_at":"2025-05-28T03:47:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces bounded fake data injection attacks that force a class of stochastic bandit algorithms to select a target arm in nearly all rounds at sublinear attack cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2403.11782","ref_index":63,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A tutorial on learning from preferences and choices with Gaussian Processes","primary_cat":"cs.LG","submitted_at":"2024-03-18T13:40:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Tutorial on a GP-based framework for preference and choice learning that unifies random utility models, limits of discernment, and multi-utility scenarios via customized likelihoods for object and label preferences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}