Online density estimation of bradley-terry models

Issei Matsumoto, kohei hatano, Eiji Takimoto

研究成果: ジャーナルへの寄稿Conference article

抄録

We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n 1/2 (lnK) √ T) and O(n 3/2K ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as θ(Tn-1), which implies that the worst case regret bounds of OGD and ONS are O(n 3/2 √ T ln T) and Õ(n 3/2 (T)n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n 5/2 T 1/3 ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.

元の言語英語
ジャーナルJournal of Machine Learning Research
40
発行部数2015
出版物ステータス出版済み - 1 1 2015
イベント28th Conference on Learning Theory, COLT 2015 - Paris, フランス
継続期間: 7 2 20157 6 2015

Fingerprint

Bradley-Terry Model
On-line Estimation
Density Estimation
Regret
Loss Function
Convex optimization
Online Optimization
Non-convexity
Likelihood Function
Convex Optimization
Horizon
Optimization Algorithm
Radius
Game
Imply
Range of data

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

これを引用

Online density estimation of bradley-terry models. / Matsumoto, Issei; hatano, kohei; Takimoto, Eiji.

:: Journal of Machine Learning Research, 巻 40, 番号 2015, 01.01.2015.

研究成果: ジャーナルへの寄稿Conference article

@article{6d49c160d7d2424cb6968cbd60f94c0d,
title = "Online density estimation of bradley-terry models",
abstract = "We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n 1/2 (lnK) √ T) and O(n 3/2K ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as θ(Tn-1), which implies that the worst case regret bounds of OGD and ONS are O(n 3/2 √ T ln T) and {\~O}(n 3/2 (T)n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n 5/2 T 1/3 ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.",
author = "Issei Matsumoto and kohei hatano and Eiji Takimoto",
year = "2015",
month = "1",
day = "1",
language = "English",
volume = "40",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",
number = "2015",

}

TY - JOUR

T1 - Online density estimation of bradley-terry models

AU - Matsumoto, Issei

AU - hatano, kohei

AU - Takimoto, Eiji

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n 1/2 (lnK) √ T) and O(n 3/2K ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as θ(Tn-1), which implies that the worst case regret bounds of OGD and ONS are O(n 3/2 √ T ln T) and Õ(n 3/2 (T)n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n 5/2 T 1/3 ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.

AB - We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n 1/2 (lnK) √ T) and O(n 3/2K ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as θ(Tn-1), which implies that the worst case regret bounds of OGD and ONS are O(n 3/2 √ T ln T) and Õ(n 3/2 (T)n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n 5/2 T 1/3 ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.

UR - http://www.scopus.com/inward/record.url?scp=84984691651&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984691651&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84984691651

VL - 40

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - 2015

ER -