Workflow scheduling with fault tolerance

Laiping Zhao, Kouichi Sakurai

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter describes a study on workflow scheduling with fault tolerance. It starts with an understanding on workflow scheduling and fault tolerance technologies independently. Next, the chapter surveys the related works on the combination field of workflow scheduling and fault tolerance technologies. Generally, these works are classified into six categories corresponding to the six fault tolerance technologies: workflow scheduling with primary/backup, primary/backup with multiple backups, checkpoint, rescheduling, active replication, and active replication with dynamic replicas. An in-depth study on these six topics illustrates the challenge issues explored so far, e.g. overloading conditions, tradeoffs among scheduling criteria, et cetera, and some future research directions are also identified. As applications are increasingly complex, and failures become a severe problem in the large scale systems, the authors expect to provide a comprehensive review on the problem of workflow scheduling with fault tolerance through this work.

Original languageEnglish
Title of host publicationNetwork and Traffic Engineering in Emerging Distributed Computing Applications
PublisherIGI Global
Pages94-123
Number of pages30
ISBN (Print)9781466618886
DOIs
Publication statusPublished - Dec 1 2012

Fingerprint

Fault tolerance
Scheduling
Large scale systems

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Zhao, L., & Sakurai, K. (2012). Workflow scheduling with fault tolerance. In Network and Traffic Engineering in Emerging Distributed Computing Applications (pp. 94-123). IGI Global. https://doi.org/10.4018/978-1-4666-1888-6.ch005

Workflow scheduling with fault tolerance. / Zhao, Laiping; Sakurai, Kouichi.

Network and Traffic Engineering in Emerging Distributed Computing Applications. IGI Global, 2012. p. 94-123.

Research output: Chapter in Book/Report/Conference proceedingChapter

Zhao, L & Sakurai, K 2012, Workflow scheduling with fault tolerance. in Network and Traffic Engineering in Emerging Distributed Computing Applications. IGI Global, pp. 94-123. https://doi.org/10.4018/978-1-4666-1888-6.ch005
Zhao L, Sakurai K. Workflow scheduling with fault tolerance. In Network and Traffic Engineering in Emerging Distributed Computing Applications. IGI Global. 2012. p. 94-123 https://doi.org/10.4018/978-1-4666-1888-6.ch005
Zhao, Laiping ; Sakurai, Kouichi. / Workflow scheduling with fault tolerance. Network and Traffic Engineering in Emerging Distributed Computing Applications. IGI Global, 2012. pp. 94-123
@inbook{fd7eda963dd0496da735a84f77310ae1,
title = "Workflow scheduling with fault tolerance",
abstract = "This chapter describes a study on workflow scheduling with fault tolerance. It starts with an understanding on workflow scheduling and fault tolerance technologies independently. Next, the chapter surveys the related works on the combination field of workflow scheduling and fault tolerance technologies. Generally, these works are classified into six categories corresponding to the six fault tolerance technologies: workflow scheduling with primary/backup, primary/backup with multiple backups, checkpoint, rescheduling, active replication, and active replication with dynamic replicas. An in-depth study on these six topics illustrates the challenge issues explored so far, e.g. overloading conditions, tradeoffs among scheduling criteria, et cetera, and some future research directions are also identified. As applications are increasingly complex, and failures become a severe problem in the large scale systems, the authors expect to provide a comprehensive review on the problem of workflow scheduling with fault tolerance through this work.",
author = "Laiping Zhao and Kouichi Sakurai",
year = "2012",
month = "12",
day = "1",
doi = "10.4018/978-1-4666-1888-6.ch005",
language = "English",
isbn = "9781466618886",
pages = "94--123",
booktitle = "Network and Traffic Engineering in Emerging Distributed Computing Applications",
publisher = "IGI Global",

}

TY - CHAP

T1 - Workflow scheduling with fault tolerance

AU - Zhao, Laiping

AU - Sakurai, Kouichi

PY - 2012/12/1

Y1 - 2012/12/1

N2 - This chapter describes a study on workflow scheduling with fault tolerance. It starts with an understanding on workflow scheduling and fault tolerance technologies independently. Next, the chapter surveys the related works on the combination field of workflow scheduling and fault tolerance technologies. Generally, these works are classified into six categories corresponding to the six fault tolerance technologies: workflow scheduling with primary/backup, primary/backup with multiple backups, checkpoint, rescheduling, active replication, and active replication with dynamic replicas. An in-depth study on these six topics illustrates the challenge issues explored so far, e.g. overloading conditions, tradeoffs among scheduling criteria, et cetera, and some future research directions are also identified. As applications are increasingly complex, and failures become a severe problem in the large scale systems, the authors expect to provide a comprehensive review on the problem of workflow scheduling with fault tolerance through this work.

AB - This chapter describes a study on workflow scheduling with fault tolerance. It starts with an understanding on workflow scheduling and fault tolerance technologies independently. Next, the chapter surveys the related works on the combination field of workflow scheduling and fault tolerance technologies. Generally, these works are classified into six categories corresponding to the six fault tolerance technologies: workflow scheduling with primary/backup, primary/backup with multiple backups, checkpoint, rescheduling, active replication, and active replication with dynamic replicas. An in-depth study on these six topics illustrates the challenge issues explored so far, e.g. overloading conditions, tradeoffs among scheduling criteria, et cetera, and some future research directions are also identified. As applications are increasingly complex, and failures become a severe problem in the large scale systems, the authors expect to provide a comprehensive review on the problem of workflow scheduling with fault tolerance through this work.

UR - http://www.scopus.com/inward/record.url?scp=84898250258&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898250258&partnerID=8YFLogxK

U2 - 10.4018/978-1-4666-1888-6.ch005

DO - 10.4018/978-1-4666-1888-6.ch005

M3 - Chapter

AN - SCOPUS:84898250258

SN - 9781466618886

SP - 94

EP - 123

BT - Network and Traffic Engineering in Emerging Distributed Computing Applications

PB - IGI Global

ER -