Bandit online optimization over the permutahedron

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…,π(n)) for all permutations (bijections) π over {1,…,n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i=1nπt(i)st(i). We study the problem in two different approaches. In the two approaches, we assume that st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al. (2012) guarantees a regret of O(nTlog⁡n) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. In the first approach, we provide an algorithm of slightly worse regret O(n3/2T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett–Luce noisy sorting process's ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters. In the second approach, we present and analyze an algorithm based on Bubeck et al.'s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The second algorithm's running time and regret guarantees are similar to our first algorithm, modulo a numerical line search procedure the running time of which we have not been able to analyze. It is interesting that the two approaches are totally different. The main open problem from this work is whether there exists a bandit algorithm for this problem with both optimal regret of O(nT) and running time of O(n3) for either regime, or there is an inherent tradeoff between the two performance measures.

Original languageEnglish
Pages (from-to)92-108
Number of pages17
JournalTheoretical Computer Science
Volume650
DOIs
Publication statusPublished - Oct 18 2016

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Bandit online optimization over the permutahedron'. Together they form a unique fingerprint.

Cite this