Bandit online optimization over the permutahedron

研究成果: Contribution to journalArticle査読

4 被引用数 (Scopus)

抄録

The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…,π(n)) for all permutations (bijections) π over {1,…,n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i=1nπt(i)st(i). We study the problem in two different approaches. In the two approaches, we assume that st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al. (2012) guarantees a regret of O(nTlog⁡n) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. In the first approach, we provide an algorithm of slightly worse regret O(n3/2T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett–Luce noisy sorting process's ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters. In the second approach, we present and analyze an algorithm based on Bubeck et al.'s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The second algorithm's running time and regret guarantees are similar to our first algorithm, modulo a numerical line search procedure the running time of which we have not been able to analyze. It is interesting that the two approaches are totally different. The main open problem from this work is whether there exists a bandit algorithm for this problem with both optimal regret of O(nT) and running time of O(n3) for either regime, or there is an inherent tradeoff between the two performance measures.

本文言語英語
ページ(範囲)92-108
ページ数17
ジャーナルTheoretical Computer Science
650
DOI
出版ステータス出版済み - 10 18 2016

All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Bandit online optimization over the permutahedron」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル