Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems

Laiping Zhao, Yizhi Ren, Yang Xiang, Kouichi Sakurai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

38 Citations (Scopus)

Abstract

In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user's reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema, and exploit a dynamic number of replicas for different tasks. Both the theoretical analysis and experiments prove that the MaxRe algorithm's schedule can certainly satisfy user's reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.

Original languageEnglish
Title of host publicationProceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010
Pages434-441
Number of pages8
DOIs
Publication statusPublished - 2010
Event2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010 - Melbourne, VIC, Australia
Duration: Sep 1 2010Sep 3 2010

Other

Other2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010
CountryAustralia
CityMelbourne, VIC
Period9/1/109/3/10

Fingerprint

Heterogeneous Systems
Replica
Fault-tolerant
Scheduling
Scheduling algorithms
Scheduling Algorithm
Schema
Replication
Resources
Requirements
Reliability Analysis
Reliability analysis
Theoretical Analysis
Schedule
Economics
Imply
Target
Costs
Experiment
Experiments

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Zhao, L., Ren, Y., Xiang, Y., & Sakurai, K. (2010). Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems. In Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010 (pp. 434-441) https://doi.org/10.1109/HPCC.2010.72

Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems. / Zhao, Laiping; Ren, Yizhi; Xiang, Yang; Sakurai, Kouichi.

Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010. 2010. p. 434-441.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhao, L, Ren, Y, Xiang, Y & Sakurai, K 2010, Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems. in Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010. pp. 434-441, 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010, Melbourne, VIC, Australia, 9/1/10. https://doi.org/10.1109/HPCC.2010.72
Zhao L, Ren Y, Xiang Y, Sakurai K. Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems. In Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010. 2010. p. 434-441 https://doi.org/10.1109/HPCC.2010.72
Zhao, Laiping ; Ren, Yizhi ; Xiang, Yang ; Sakurai, Kouichi. / Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems. Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010. 2010. pp. 434-441
@inproceedings{c9fa18892490442d925f5abaaccf2939,
title = "Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems",
abstract = "In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user's reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema, and exploit a dynamic number of replicas for different tasks. Both the theoretical analysis and experiments prove that the MaxRe algorithm's schedule can certainly satisfy user's reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70{\%} fewer resources than the FTSA algorithm.",
author = "Laiping Zhao and Yizhi Ren and Yang Xiang and Kouichi Sakurai",
year = "2010",
doi = "10.1109/HPCC.2010.72",
language = "English",
isbn = "9780769542140",
pages = "434--441",
booktitle = "Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010",

}

TY - GEN

T1 - Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems

AU - Zhao, Laiping

AU - Ren, Yizhi

AU - Xiang, Yang

AU - Sakurai, Kouichi

PY - 2010

Y1 - 2010

N2 - In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user's reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema, and exploit a dynamic number of replicas for different tasks. Both the theoretical analysis and experiments prove that the MaxRe algorithm's schedule can certainly satisfy user's reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.

AB - In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user's reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema, and exploit a dynamic number of replicas for different tasks. Both the theoretical analysis and experiments prove that the MaxRe algorithm's schedule can certainly satisfy user's reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84863393973&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863393973&partnerID=8YFLogxK

U2 - 10.1109/HPCC.2010.72

DO - 10.1109/HPCC.2010.72

M3 - Conference contribution

AN - SCOPUS:84863393973

SN - 9780769542140

SP - 434

EP - 441

BT - Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010

ER -