`Lob-pass' problem and an on-line learning model of rational choice

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

We consider an on-line learning model of rational choice, in which the goal of an agent is to choose its actions so as to maximize the number of successes, while learning about its reacting environment through those very actions. In particular, we consider a model of tennis play, in which the only actions that the player can take are a `pass' and a `lob,' and the opponent is modeled by two linear (probabilistic) functions fL(r) = a1r+b1 and fP(r) = a2r+b2, specifying the probability that a lob (and a pass, respectively) will win a point when the proportion of lobs in the past trials is r. We measure the performance of a player in this model by its expected regret, namely how many less points it expects to win as compared to the ideal player (one that knows the two probabilistic functions) as a function of t, the total number of trials, which is unknown to the player a priori. Assuming that the probabilistic functions satisfy the matching shoulder condition, i.e. fL(0) = fP(1), we obtain a variety of upper bounds for assumptions and restrictions of varying degrees, ranging from O(log t), O(t1/3), O(t 1/2 ), O(t3/5), O(t2/3) to O(t5/7) as well as a matching lower bound of order Ω(log t) for the most restrictive case. When the total number of trials t is given to the player in advance, the upper bounds can be improved significantly.

Original languageEnglish
Title of host publicationProc 6 Annu ACM Conf Comput Learn Theory
Editors Anon
PublisherPubl by ACM
Pages422-428
Number of pages7
ISBN (Print)0897916115
Publication statusPublished - 1993
Externally publishedYes
EventProceedings of the 6th Annual ACM Conference on Computational Learning Theory - Santa Cruz, CA, USA
Duration: Jul 26 1993Jul 28 1993

Other

OtherProceedings of the 6th Annual ACM Conference on Computational Learning Theory
CitySanta Cruz, CA, USA
Period7/26/937/28/93

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Abe, N., & Takeuchi, J. (1993). `Lob-pass' problem and an on-line learning model of rational choice. In Anon (Ed.), Proc 6 Annu ACM Conf Comput Learn Theory (pp. 422-428). Publ by ACM.

`Lob-pass' problem and an on-line learning model of rational choice. / Abe, Naoki; Takeuchi, Junnichi.

Proc 6 Annu ACM Conf Comput Learn Theory. ed. / Anon. Publ by ACM, 1993. p. 422-428.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abe, N & Takeuchi, J 1993, `Lob-pass' problem and an on-line learning model of rational choice. in Anon (ed.), Proc 6 Annu ACM Conf Comput Learn Theory. Publ by ACM, pp. 422-428, Proceedings of the 6th Annual ACM Conference on Computational Learning Theory, Santa Cruz, CA, USA, 7/26/93.
Abe N, Takeuchi J. `Lob-pass' problem and an on-line learning model of rational choice. In Anon, editor, Proc 6 Annu ACM Conf Comput Learn Theory. Publ by ACM. 1993. p. 422-428
Abe, Naoki ; Takeuchi, Junnichi. / `Lob-pass' problem and an on-line learning model of rational choice. Proc 6 Annu ACM Conf Comput Learn Theory. editor / Anon. Publ by ACM, 1993. pp. 422-428
@inproceedings{388c24f4cf084d5788df3d5c717a94b4,
title = "`Lob-pass' problem and an on-line learning model of rational choice",
abstract = "We consider an on-line learning model of rational choice, in which the goal of an agent is to choose its actions so as to maximize the number of successes, while learning about its reacting environment through those very actions. In particular, we consider a model of tennis play, in which the only actions that the player can take are a `pass' and a `lob,' and the opponent is modeled by two linear (probabilistic) functions fL(r) = a1r+b1 and fP(r) = a2r+b2, specifying the probability that a lob (and a pass, respectively) will win a point when the proportion of lobs in the past trials is r. We measure the performance of a player in this model by its expected regret, namely how many less points it expects to win as compared to the ideal player (one that knows the two probabilistic functions) as a function of t, the total number of trials, which is unknown to the player a priori. Assuming that the probabilistic functions satisfy the matching shoulder condition, i.e. fL(0) = fP(1), we obtain a variety of upper bounds for assumptions and restrictions of varying degrees, ranging from O(log t), O(t1/3), O(t 1/2 ), O(t3/5), O(t2/3) to O(t5/7) as well as a matching lower bound of order Ω(log t) for the most restrictive case. When the total number of trials t is given to the player in advance, the upper bounds can be improved significantly.",
author = "Naoki Abe and Junnichi Takeuchi",
year = "1993",
language = "English",
isbn = "0897916115",
pages = "422--428",
editor = "Anon",
booktitle = "Proc 6 Annu ACM Conf Comput Learn Theory",
publisher = "Publ by ACM",

}

TY - GEN

T1 - `Lob-pass' problem and an on-line learning model of rational choice

AU - Abe, Naoki

AU - Takeuchi, Junnichi

PY - 1993

Y1 - 1993

N2 - We consider an on-line learning model of rational choice, in which the goal of an agent is to choose its actions so as to maximize the number of successes, while learning about its reacting environment through those very actions. In particular, we consider a model of tennis play, in which the only actions that the player can take are a `pass' and a `lob,' and the opponent is modeled by two linear (probabilistic) functions fL(r) = a1r+b1 and fP(r) = a2r+b2, specifying the probability that a lob (and a pass, respectively) will win a point when the proportion of lobs in the past trials is r. We measure the performance of a player in this model by its expected regret, namely how many less points it expects to win as compared to the ideal player (one that knows the two probabilistic functions) as a function of t, the total number of trials, which is unknown to the player a priori. Assuming that the probabilistic functions satisfy the matching shoulder condition, i.e. fL(0) = fP(1), we obtain a variety of upper bounds for assumptions and restrictions of varying degrees, ranging from O(log t), O(t1/3), O(t 1/2 ), O(t3/5), O(t2/3) to O(t5/7) as well as a matching lower bound of order Ω(log t) for the most restrictive case. When the total number of trials t is given to the player in advance, the upper bounds can be improved significantly.

AB - We consider an on-line learning model of rational choice, in which the goal of an agent is to choose its actions so as to maximize the number of successes, while learning about its reacting environment through those very actions. In particular, we consider a model of tennis play, in which the only actions that the player can take are a `pass' and a `lob,' and the opponent is modeled by two linear (probabilistic) functions fL(r) = a1r+b1 and fP(r) = a2r+b2, specifying the probability that a lob (and a pass, respectively) will win a point when the proportion of lobs in the past trials is r. We measure the performance of a player in this model by its expected regret, namely how many less points it expects to win as compared to the ideal player (one that knows the two probabilistic functions) as a function of t, the total number of trials, which is unknown to the player a priori. Assuming that the probabilistic functions satisfy the matching shoulder condition, i.e. fL(0) = fP(1), we obtain a variety of upper bounds for assumptions and restrictions of varying degrees, ranging from O(log t), O(t1/3), O(t 1/2 ), O(t3/5), O(t2/3) to O(t5/7) as well as a matching lower bound of order Ω(log t) for the most restrictive case. When the total number of trials t is given to the player in advance, the upper bounds can be improved significantly.

UR - http://www.scopus.com/inward/record.url?scp=0027843807&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027843807&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0027843807

SN - 0897916115

SP - 422

EP - 428

BT - Proc 6 Annu ACM Conf Comput Learn Theory

A2 - Anon, null

PB - Publ by ACM

ER -