Spectral bandits for smooth graph functions with applications in recommender systems

Branislav Kveton; Michal Valko; R\'emi Munos; Shipra Agrawal; Tom\'a\v{s} Koc\'ak

arxiv: 2605.20552 · v1 · pith:I7Q6JAA3new · submitted 2026-05-19 · 📊 stat.ML · cs.LG

Spectral bandits for smooth graph functions with applications in recommender systems

Tom\'a\v{s} Koc\'ak , Michal Valko , R\'emi Munos , Branislav Kveton , Shipra Agrawal This is my paper

Pith reviewed 2026-05-21 06:11 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords spectral banditssmooth graph functionsrecommender systemseffective dimensiononline learningregret boundscontent recommendation

0 comments

The pith

Bandit algorithms for smooth graph payoffs scale regret linearly with effective dimension rather than graph size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies a bandit problem where arm payoffs follow a smooth function on a graph, a setting that matches recommendation tasks in which similar items receive similar expected ratings. The authors introduce an effective dimension that stays small on real-world graphs and give two algorithms whose sample complexity and regret grow only linearly with this dimension. Experiments on content recommendation show that preferences across thousands of items can be learned after evaluating only tens of nodes.

Core claim

When payoffs are smooth on a graph, the cumulative regret of learning is governed by an effective dimension that captures the smoothness structure; two spectral algorithms achieve regret bounds linear in this dimension and therefore remain practical even when the graph contains thousands of nodes.

What carries the argument

Effective dimension of the graph, a quantity that measures the degrees of freedom remaining after smoothness constraints are imposed on the payoff vector.

If this is right

Regret bounds become independent of the total number of arms whenever the effective dimension is small.
A user preference model over thousands of items can be estimated from only tens of direct evaluations.
The same smoothness assumption directly supports content-based recommendation where item similarity is encoded by graph edges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on social-network graphs to check whether their effective dimensions also remain small.
If smoothness holds only approximately, one could add a small regularization term and measure the resulting increase in effective dimension.
The method suggests a general template for replacing full-arm exploration with graph-induced low-dimensional exploration in other online decision problems.

Load-bearing premise

The expected payoffs must form a smooth function on the given graph so that the effective dimension remains small enough for the regret bounds to stay useful.

What would settle it

A real recommendation graph in which the effective dimension grows linearly with the number of nodes, making regret scale with total graph size rather than staying sublinear.

Figures

Figures reproduced from arXiv: 2605.20552 by Branislav Kveton, Michal Valko, R\'emi Munos, Shipra Agrawal, Tom\'a\v{s} Koc\'ak.

**Figure 2.** Figure 2: Effective dimension d in the regime T < N. The effective dimension for this data is much smaller than the ambient dimension D, which is 500 and 4546 respectively. Algorithm 1 SpectralUCB Input: N : the number of vertices, T : the number of pulls {ΛL, Q} spectral basis of L λ, δ : regularization and confidence parameters R, C : upper bounds on the noise and ∥α∗∥Λ Run: Λ ← ΛL + λI d ← max{d : (d − 1)λd ≤ T /… view at source ↗

**Figure 3.** Figure 3: shows the results averaged over 10 simulations. Notice that while the result of SpectralTS are comparable to SpectralUCB, its computational time is much faster due the reasons discussed in Section 4. Recall that while both algorithms compute the least-square problem of the same size, SpectralUCB has then to compute the confidence interval for each arm. 0 40 80 120 160 200 0 5 10 15 20 25 30 35 40 45 50 ti… view at source ↗

**Figure 4.** Figure 4: shows the MovieLens data results averaged over 10 randomly sampled users. Notice that the results follow the same trends as for synthetic data. Overall, both our spectral algorithms outperform their linear counterparts in terms of cumulative regret. 0 40 80 120 160 200 0 50 100 150 200 250 300 time t cumulative regret Movielens data N=2019, average of 10 users, T=200, d = 5 SpectralUCB LinUCB SpectralTS Li… view at source ↗

read the original abstract

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper defines an effective dimension for smooth graph payoffs in bandits and gives algorithms whose regret scales linearly in it rather than the full node count.

read the letter

Hi colleague, the main thing here is that they take the bandit setting where payoffs are smooth on a graph and control regret through an effective dimension that stays small on real data. This lets the two proposed algorithms scale linearly in that dimension instead of the total number of arms, which is the scaling people need for recommendation problems with thousands of items. They frame the setup around content recommendation where neighboring nodes should have similar expected ratings, and the experiments claim you can learn a decent preference estimator from just tens of evaluations. That practical angle is what stands out. The soft spot is that everything rests on the smoothness assumption keeping the effective dimension small and usable. If the graph does not satisfy it or if computing the dimension requires heavy spectral work, the advantage disappears. The abstract states the linear scaling clearly but leaves the exact definition, the regret proof details, and the experimental baselines unexamined, so it is hard to judge how much this improves on prior graph bandit work. This is for researchers doing structured bandits or online recommendation with graph side information. A reader who wants regret bounds that adapt to data geometry would find the framing useful. I would send it for peer review because the core idea is coherent and the application is concrete enough to merit checking the math and experiments.

Referee Report

0 major / 3 minor

Summary. The paper studies a bandit problem in which expected arm payoffs form a smooth function on a given graph, motivated by content-based recommender systems where items correspond to nodes and similar items have similar ratings. The authors introduce an effective dimension derived from the graph spectrum that remains small for real-world graphs under the smoothness assumption, and they propose two algorithms whose regret scales linearly in this dimension rather than the total number of nodes. Experiments on real-world recommendation tasks show that accurate preference estimates for thousands of items can be obtained from only tens of evaluations.

Significance. If the regret bounds are valid and the effective dimension is indeed small in practice, the work offers a principled way to scale bandit algorithms to large graphs by exploiting spectral smoothness, which is relevant to recommender systems and semi-supervised learning. The empirical demonstration on real data strengthens the practical case, and the explicit connection between graph spectrum and linear regret scaling is a clear contribution if the derivations hold.

minor comments (3)

[Abstract and §4] The abstract states that the algorithms 'scale linearly in this dimension,' but the precise dependence on the smoothness parameter and the graph Laplacian should be stated explicitly in the main theorem statement for clarity.
[§6] In the experimental section, the real-world graphs used for evaluation should be characterized by their effective dimension values and eigenvalue decay to allow readers to verify that the dimension is indeed small as claimed.
[§2] Notation for the smoothness assumption (e.g., how the payoff vector lies in the span of low-frequency eigenvectors) is introduced without a dedicated preliminary section; a short background paragraph on graph Fourier analysis would improve accessibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work. We appreciate the recommendation for minor revision and will incorporate improvements to enhance clarity and presentation in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines an effective dimension from the spectral decomposition of the graph Laplacian under an explicit smoothness assumption on payoffs; this is an independent modeling choice drawn from graph signal processing rather than a quantity fitted to the target regret or renamed from a prediction. The two proposed algorithms and their linear scaling in the effective dimension follow from standard linear bandit analysis applied to a low-dimensional subspace spanned by the leading eigenvectors, with regret bounds derived directly from the definition and the smoothness parameter without reducing to self-citation chains or tautological fits. Experiments on real recommendation graphs serve as external validation that the dimension remains small, keeping the derivation self-contained against established benchmarks in spectral methods and online learning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution rests on the assumption that payoffs are smooth on the graph and that the resulting effective dimension is small for real-world instances; no free parameters or invented entities beyond the effective dimension are mentioned in the abstract.

axioms (1)

domain assumption Payoffs of arms are smooth functions on the graph
Stated in the abstract as the problem setting that enables the effective-dimension approach.

invented entities (1)

effective dimension no independent evidence
purpose: Quantity that controls regret scaling and is small on real graphs
Introduced in the abstract as the key notion allowing linear scaling.

pith-pipeline@v0.9.0 · 5690 in / 1165 out tokens · 31866 ms · 2026-05-21T06:11:17.973388+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Auer, P. 2002. Using confidence bounds for exploitation-exploration trade-offs.Journal of Machine Learning Research3:397–422

work page 2002
[2]

Barabasi, A.-L., and Albert, R. 1999. Emergence of scaling in random networks.Science286:11

work page 1999
[3]

Belkin, M.; Matveeva, I.; and Niyogi, P. 2004. Regular- ization and Semi-Supervised Learning on Large Graphs. InConference on Learning Theory

work page 2004
[4]

Belkin, M.; Niyogi, P.; and Sindhwani, V . 2006. Mani- fold Regularization: A Geometric Framework for Learn- ing from Labeled and Unlabeled Examples.Journal of Machine Learning Research7:2399–2434

work page 2006
[5]

J.; and Chen, J

Billsus, D.; Pazzani, M. J.; and Chen, J. 2000. A learn- ing agent for wireless news access. InIUI, 33–36

work page 2000
[6]

2010.Recommender Systems: An Introduction

Jannach, D.; Zanker, M.; Felfernig, A.; and Friedrich, G. 2010.Recommender Systems: An Introduction. Cam- bridge University Press

work page 2010
[7]

Koc ´ak, T.; Valko, M.; Munos, R.; and Agrawal, S

work page
[8]

InProceedings of the Twenty-Eighth AAAI Conference on Artificial Intelli- gence

Spectral Thompson Sampling. InProceedings of the Twenty-Eighth AAAI Conference on Artificial Intelli- gence

work page
[9]

L.; and Peng, R

Koutis, I.; Miller, G. L.; and Peng, R. 2010. Ap- proaching Optimality for Solving SDD Linear Systems. InFoundations of Computer Science, 235–244

work page 2010
[10]

Lam, S., and Herlocker, J. 2012. MovieLens 1M Dataset. http://www.grouplens.org/node/12

work page 2012
[11]

Li, L.; Chu, W.; Langford, J.; and Schapire, R. E. 2010. A Contextual-Bandit Approach to Personalized News Ar- ticle Recommendation. InInternational World Wide Web Conference

work page 2010
[12]

McPherson, M.; Smith-Lovin, L.; and Cook, J. 2001. Birds of a Feather: Homophily in Social Networks.An- nual Review of Sociology27:415–444

work page 2001
[13]

J., and Billsus, D

Pazzani, M. J., and Billsus, D. 2007. Content-Based Recommendation Systems. InThe Adaptive Web

work page 2007
[14]

Valko, M.; Munos, R.; Kveton, B.; and Koc´ak, T. 2014. Spectral Bandits for Smooth Graph Functions. InInter- national Conference on Machine Learning

work page 2014
[15]

Zhu, X. 2008. Semi-Supervised Learning Literature Survey. Technical Report 1530, University of Wisconsin- Madison

work page 2008

[1] [1]

Auer, P. 2002. Using confidence bounds for exploitation-exploration trade-offs.Journal of Machine Learning Research3:397–422

work page 2002

[2] [2]

Barabasi, A.-L., and Albert, R. 1999. Emergence of scaling in random networks.Science286:11

work page 1999

[3] [3]

Belkin, M.; Matveeva, I.; and Niyogi, P. 2004. Regular- ization and Semi-Supervised Learning on Large Graphs. InConference on Learning Theory

work page 2004

[4] [4]

Belkin, M.; Niyogi, P.; and Sindhwani, V . 2006. Mani- fold Regularization: A Geometric Framework for Learn- ing from Labeled and Unlabeled Examples.Journal of Machine Learning Research7:2399–2434

work page 2006

[5] [5]

J.; and Chen, J

Billsus, D.; Pazzani, M. J.; and Chen, J. 2000. A learn- ing agent for wireless news access. InIUI, 33–36

work page 2000

[6] [6]

2010.Recommender Systems: An Introduction

Jannach, D.; Zanker, M.; Felfernig, A.; and Friedrich, G. 2010.Recommender Systems: An Introduction. Cam- bridge University Press

work page 2010

[7] [7]

Koc ´ak, T.; Valko, M.; Munos, R.; and Agrawal, S

work page

[8] [8]

InProceedings of the Twenty-Eighth AAAI Conference on Artificial Intelli- gence

Spectral Thompson Sampling. InProceedings of the Twenty-Eighth AAAI Conference on Artificial Intelli- gence

work page

[9] [9]

L.; and Peng, R

Koutis, I.; Miller, G. L.; and Peng, R. 2010. Ap- proaching Optimality for Solving SDD Linear Systems. InFoundations of Computer Science, 235–244

work page 2010

[10] [10]

Lam, S., and Herlocker, J. 2012. MovieLens 1M Dataset. http://www.grouplens.org/node/12

work page 2012

[11] [11]

Li, L.; Chu, W.; Langford, J.; and Schapire, R. E. 2010. A Contextual-Bandit Approach to Personalized News Ar- ticle Recommendation. InInternational World Wide Web Conference

work page 2010

[12] [12]

McPherson, M.; Smith-Lovin, L.; and Cook, J. 2001. Birds of a Feather: Homophily in Social Networks.An- nual Review of Sociology27:415–444

work page 2001

[13] [13]

J., and Billsus, D

Pazzani, M. J., and Billsus, D. 2007. Content-Based Recommendation Systems. InThe Adaptive Web

work page 2007

[14] [14]

Valko, M.; Munos, R.; Kveton, B.; and Koc´ak, T. 2014. Spectral Bandits for Smooth Graph Functions. InInter- national Conference on Machine Learning

work page 2014

[15] [15]

Zhu, X. 2008. Semi-Supervised Learning Literature Survey. Technical Report 1530, University of Wisconsin- Madison

work page 2008