F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
Maxwell Harper and Joseph A
14 Pith papers cite this work, alongside 2,615 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 14roles
background 1polarities
background 1representative citing papers
A Bayesian predictive model adaptively selects martingale factors to construct asymptotically log-optimal confidence sequences for bounded means while preserving anytime validity under misspecification.
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.
For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.
Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.
WPGRec is a new sequential recommender that performs multi-scale temporal modeling via stationary wavelet packets and injects high-order collaborative information through scale-aligned graph propagation with energy-aware gated fusion.
ILASP approximates neural networks for recipe preference learning as both global and local models, using weak constraints and PCA to maintain fidelity and interpretability.
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.
citing papers explorer
-
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking
F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
-
Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means
A Bayesian predictive model adaptively selects martingale factors to construct asymptotically log-optimal confidence sequences for bounded means while preserving anytime validity under misspecification.
-
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
-
Green-Red Watermarking for Recommender Systems
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
-
HORIZON: A Benchmark for In-the-wild User Behaviour Modeling
HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.
-
Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits
For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.
-
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
-
Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.
-
Graphify: Automated Synthesis of Type-Safe Graph Backends via $O(S)$ GraphQL-to-Gremlin Transpilation
Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.
-
From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.
-
WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation
WPGRec is a new sequential recommender that performs multi-scale temporal modeling via stationary wavelet packets and injects high-order collaborative information through scale-aligned graph propagation with energy-aware gated fusion.
-
Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach
ILASP approximates neural networks for recipe preference learning as both global and local models, using weak constraints and PCA to maintain fidelity and interpretability.
-
Offline Evaluation Measures of Fairness in Recommender Systems
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
-
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.