Online Learning of Genetic Network Programming and its Application to Prisoner's Dilemma Game

Shingo Mabu, Jinglu Hu, Junichi Murata, Kotaro Hirasawa

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn't need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner's dilemma game“ and its ability for online adaptation is confirmed.

Original languageEnglish
Pages (from-to)535-543
Number of pages9
JournalIEEJ Transactions on Electronics, Information and Systems
Volume123
Issue number3
DOIs
Publication statusPublished - Jan 1 2003

Fingerprint

Reinforcement learning
Data storage equipment

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Cite this

Online Learning of Genetic Network Programming and its Application to Prisoner's Dilemma Game. / Mabu, Shingo; Hu, Jinglu; Murata, Junichi; Hirasawa, Kotaro.

In: IEEJ Transactions on Electronics, Information and Systems, Vol. 123, No. 3, 01.01.2003, p. 535-543.

Research output: Contribution to journalArticle

@article{9591182dbb864079a96023ce7770a972,
title = "Online Learning of Genetic Network Programming and its Application to Prisoner's Dilemma Game",
abstract = "A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn't need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner's dilemma game“ and its ability for online adaptation is confirmed.",
author = "Shingo Mabu and Jinglu Hu and Junichi Murata and Kotaro Hirasawa",
year = "2003",
month = "1",
day = "1",
doi = "10.1541/ieejeiss.123.535",
language = "English",
volume = "123",
pages = "535--543",
journal = "IEEJ Transactions on Electronics, Information and Systems",
issn = "0385-4221",
publisher = "The Institute of Electrical Engineers of Japan",
number = "3",

}

TY - JOUR

T1 - Online Learning of Genetic Network Programming and its Application to Prisoner's Dilemma Game

AU - Mabu, Shingo

AU - Hu, Jinglu

AU - Murata, Junichi

AU - Hirasawa, Kotaro

PY - 2003/1/1

Y1 - 2003/1/1

N2 - A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn't need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner's dilemma game“ and its ability for online adaptation is confirmed.

AB - A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn't need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner's dilemma game“ and its ability for online adaptation is confirmed.

UR - http://www.scopus.com/inward/record.url?scp=34547270982&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547270982&partnerID=8YFLogxK

U2 - 10.1541/ieejeiss.123.535

DO - 10.1541/ieejeiss.123.535

M3 - Article

AN - SCOPUS:34547270982

VL - 123

SP - 535

EP - 543

JO - IEEJ Transactions on Electronics, Information and Systems

JF - IEEJ Transactions on Electronics, Information and Systems

SN - 0385-4221

IS - 3

ER -