Bandit online optimization over the permutahedron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over {1,…, n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i n=1 πt(i)st(i).We study the problem in two regimes. In the first regime, st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(n√T log n) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret O(n3/2√T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters.In the second regime, st is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of O(n√T), but for a more restricted space of possible loss vectors.

Original languageEnglish
Title of host publicationAlgorithmic Learning Theory - 25th International Conference, ALT 2014, Proceedings
EditorsPeter Auer, Alexander Clark, Thomas Zeugmann, Sandra Zilles
PublisherSpringer Verlag
Pages215-229
Number of pages15
ISBN (Electronic)9783319116617
DOIs
Publication statusPublished - 2014
Event25th International Conference on Algorithmic Learning Theory, ALT 2014 - Bled, Slovenia
Duration: Oct 8 2014Oct 10 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8776
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other25th International Conference on Algorithmic Learning Theory, ALT 2014
CountrySlovenia
CityBled
Period10/8/1410/10/14

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Bandit online optimization over the permutahedron'. Together they form a unique fingerprint.

Cite this