Polynomial learnability of stochastic rules with respect to the KL-divergence and quadratic distance

Naoki Abe, Junnichi Takeuchi, K. Manfred Warmuth

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

We consider the problem of efficient learning of probabilistic concepts (p-concepts) and more generally stochastic rules in the sense defined by Kearns and Schapire [6] and by Yamanishi [18]. Their models extend the PAC-learning model of Valiant [16] to the learning scenario in which the target concept or function is stochastic rather than deterministic as in Valiant's original model. In this paper, we consider the learnability of stochastic rules with respect to the classic 'Kullback-Leibler divergence' (KL divergence) as well as the quadratic distance as the distance measure between the rules. First, we show that the notion of polynomial time learnability of p-concepts and stochastic rules with fixed range size using the KL divergence is in fact equivalent to the same notion using the quadratic distance, and hence any of the distances considered in [6] and [18]: the quadratic, variation, and Hellinger distances. As a corollary, it follows that a wide range of classes of p-concepts which were shown to be polynomially learnable with respect to the quadratic distance in [6] are also learnable with respect to the KL divergence. The sample and time complexity of algorithms that would be obtained by the above general equivalence, however, are far from optimal. We present a polynomial learning algorithm with reasonable sample and time complexity for the important class of convex linear combinations of stochastic rules. We also develop a simple and versatile technique for obtaining sample complexity bounds for learning classes of stochastic rules with respect to the KL-divergence and quadratic distance, and apply them to produce bounds for the classes of probabilistic finite state acceptors (automata), probabilistic decision lists, and convex linear combinations.

Original languageEnglish
Pages (from-to)290-316
Number of pages27
JournalIEICE Transactions on Information and Systems
VolumeE84-D
Issue number3
Publication statusPublished - 2001
Externally publishedYes

Fingerprint

Polynomials
Finite automata
Learning algorithms

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Polynomial learnability of stochastic rules with respect to the KL-divergence and quadratic distance. / Abe, Naoki; Takeuchi, Junnichi; Warmuth, K. Manfred.

In: IEICE Transactions on Information and Systems, Vol. E84-D, No. 3, 2001, p. 290-316.

Research output: Contribution to journalArticle

@article{42fa97cdd40b4fc0b67c7a6b38e6bf10,
title = "Polynomial learnability of stochastic rules with respect to the KL-divergence and quadratic distance",
abstract = "We consider the problem of efficient learning of probabilistic concepts (p-concepts) and more generally stochastic rules in the sense defined by Kearns and Schapire [6] and by Yamanishi [18]. Their models extend the PAC-learning model of Valiant [16] to the learning scenario in which the target concept or function is stochastic rather than deterministic as in Valiant's original model. In this paper, we consider the learnability of stochastic rules with respect to the classic 'Kullback-Leibler divergence' (KL divergence) as well as the quadratic distance as the distance measure between the rules. First, we show that the notion of polynomial time learnability of p-concepts and stochastic rules with fixed range size using the KL divergence is in fact equivalent to the same notion using the quadratic distance, and hence any of the distances considered in [6] and [18]: the quadratic, variation, and Hellinger distances. As a corollary, it follows that a wide range of classes of p-concepts which were shown to be polynomially learnable with respect to the quadratic distance in [6] are also learnable with respect to the KL divergence. The sample and time complexity of algorithms that would be obtained by the above general equivalence, however, are far from optimal. We present a polynomial learning algorithm with reasonable sample and time complexity for the important class of convex linear combinations of stochastic rules. We also develop a simple and versatile technique for obtaining sample complexity bounds for learning classes of stochastic rules with respect to the KL-divergence and quadratic distance, and apply them to produce bounds for the classes of probabilistic finite state acceptors (automata), probabilistic decision lists, and convex linear combinations.",
author = "Naoki Abe and Junnichi Takeuchi and Warmuth, {K. Manfred}",
year = "2001",
language = "English",
volume = "E84-D",
pages = "290--316",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "一般社団法人電子情報通信学会",
number = "3",

}

TY - JOUR

T1 - Polynomial learnability of stochastic rules with respect to the KL-divergence and quadratic distance

AU - Abe, Naoki

AU - Takeuchi, Junnichi

AU - Warmuth, K. Manfred

PY - 2001

Y1 - 2001

N2 - We consider the problem of efficient learning of probabilistic concepts (p-concepts) and more generally stochastic rules in the sense defined by Kearns and Schapire [6] and by Yamanishi [18]. Their models extend the PAC-learning model of Valiant [16] to the learning scenario in which the target concept or function is stochastic rather than deterministic as in Valiant's original model. In this paper, we consider the learnability of stochastic rules with respect to the classic 'Kullback-Leibler divergence' (KL divergence) as well as the quadratic distance as the distance measure between the rules. First, we show that the notion of polynomial time learnability of p-concepts and stochastic rules with fixed range size using the KL divergence is in fact equivalent to the same notion using the quadratic distance, and hence any of the distances considered in [6] and [18]: the quadratic, variation, and Hellinger distances. As a corollary, it follows that a wide range of classes of p-concepts which were shown to be polynomially learnable with respect to the quadratic distance in [6] are also learnable with respect to the KL divergence. The sample and time complexity of algorithms that would be obtained by the above general equivalence, however, are far from optimal. We present a polynomial learning algorithm with reasonable sample and time complexity for the important class of convex linear combinations of stochastic rules. We also develop a simple and versatile technique for obtaining sample complexity bounds for learning classes of stochastic rules with respect to the KL-divergence and quadratic distance, and apply them to produce bounds for the classes of probabilistic finite state acceptors (automata), probabilistic decision lists, and convex linear combinations.

AB - We consider the problem of efficient learning of probabilistic concepts (p-concepts) and more generally stochastic rules in the sense defined by Kearns and Schapire [6] and by Yamanishi [18]. Their models extend the PAC-learning model of Valiant [16] to the learning scenario in which the target concept or function is stochastic rather than deterministic as in Valiant's original model. In this paper, we consider the learnability of stochastic rules with respect to the classic 'Kullback-Leibler divergence' (KL divergence) as well as the quadratic distance as the distance measure between the rules. First, we show that the notion of polynomial time learnability of p-concepts and stochastic rules with fixed range size using the KL divergence is in fact equivalent to the same notion using the quadratic distance, and hence any of the distances considered in [6] and [18]: the quadratic, variation, and Hellinger distances. As a corollary, it follows that a wide range of classes of p-concepts which were shown to be polynomially learnable with respect to the quadratic distance in [6] are also learnable with respect to the KL divergence. The sample and time complexity of algorithms that would be obtained by the above general equivalence, however, are far from optimal. We present a polynomial learning algorithm with reasonable sample and time complexity for the important class of convex linear combinations of stochastic rules. We also develop a simple and versatile technique for obtaining sample complexity bounds for learning classes of stochastic rules with respect to the KL-divergence and quadratic distance, and apply them to produce bounds for the classes of probabilistic finite state acceptors (automata), probabilistic decision lists, and convex linear combinations.

UR - http://www.scopus.com/inward/record.url?scp=0035281455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035281455&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0035281455

VL - E84-D

SP - 290

EP - 316

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 3

ER -