Recognition: unknown
Online Submodular Maximization under a Matroid Constraint with Application to Learning Assignments
read the original abstract
Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy decreases the marginal utility of each ad or information source. We show that these and other problems can be formalized as repeatedly selecting an assignment of items to positions to maximize a sequence of monotone submodular functions that arrive one by one. We present an efficient algorithm for this general problem and analyze it in the no-regret model. Our algorithm possesses strong theoretical guarantees, such as a performance ratio that converges to the optimal constant of 1 - 1/e. We empirically evaluate our algorithm on two real-world online optimization problems on the web: ad allocation with submodular utilities, and dynamically ranking blogs to detect information cascades. Finally, we present a second algorithm that handles the more general case in which the feasible sets are given by a matroid constraint, while still maintaining a 1 - 1/e asymptotic performance ratio.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Learning to Sparsify Stochastic Linear Bandits
Phased exploration-exploitation algorithms for sparse stochastic linear bandits achieve Õ(d√T) regret for Euclidean balls and α-regret bounds of Õ(d√T) or Õ(d T^{2/3}) for general convex sets using greedy approximation.
-
Constrained Contextual Bandits with Adversarial Contexts
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
-
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
POES frames prompt evaluation as online adaptive testing and uses a provably submodular objective to pick informative examples, delivering 6.2% higher average accuracy and 35-60% token savings versus naive full-set scoring.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.