F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
Maxwell Harper and Joseph A
14 Pith papers cite this work, alongside 2,615 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 14roles
background 1polarities
background 1representative citing papers
A Bayesian predictive model adaptively selects martingale factors to construct asymptotically log-optimal confidence sequences for bounded means while preserving anytime validity under misspecification.
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
HORIZON creates a cross-domain, long-horizon user modeling benchmark from Amazon Reviews that tests generalization across time, domains, and unseen users, exposing gaps in sequential and LLM-based recommendation models.
For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.
Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.
WPGRec is a new sequential recommender that performs multi-scale temporal modeling via stationary wavelet packets and injects high-order collaborative information through scale-aligned graph propagation with energy-aware gated fusion.
ILASP approximates neural networks for recipe preference learning as both global and local models, using weak constraints and PCA to maintain fidelity and interpretability.
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.
citing papers explorer
-
Graphify: Automated Synthesis of Type-Safe Graph Backends via $O(S)$ GraphQL-to-Gremlin Transpilation
Graphify automates synthesis of type-safe graph backends via a formal GraphQL-to-Gremlin mapping and O(S) recursive transpilation algorithm supporting CRUD and nested queries.