Scalability-based manycore partitioning

Hiroshi Sasaki, Inoue Koji, Teruo Tanimoto, Hiroshi Nakamura

研究成果: ジャーナルへの寄稿Conference article

25 引用 (Scopus)

抄録

Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a num- ber of active processes in the computer system and the con- tinuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of paral- lel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum perfor- mance. The evaluation results on a 48-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

元の言語英語
ページ(範囲)107-116
ページ数10
ジャーナルParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
DOI
出版物ステータス出版済み - 10 22 2012
イベント21st International Conference on Parallel Architectures and Compilation Techniques, PACT 2012 - Minneapolis, MN, 米国
継続期間: 9 19 20129 23 2012

Fingerprint

Many-core
Scalability
Partitioning
Thread
Multiprogramming
Throughput
Program processors
Microprocessor chips
Scheduler
Computer systems
Scheduling
Count
Hardware
Data storage equipment
Monitoring
Performance Monitoring
Memory Hierarchy
Multi-core Processor
Parallel Applications
Microprocessor

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

これを引用

Scalability-based manycore partitioning. / Sasaki, Hiroshi; Koji, Inoue; Tanimoto, Teruo; Nakamura, Hiroshi.

:: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 22.10.2012, p. 107-116.

研究成果: ジャーナルへの寄稿Conference article

@article{fd86bb512ce64fe499299e67c5edfc45,
title = "Scalability-based manycore partitioning",
abstract = "Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a num- ber of active processes in the computer system and the con- tinuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of paral- lel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum perfor- mance. The evaluation results on a 48-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.",
author = "Hiroshi Sasaki and Inoue Koji and Teruo Tanimoto and Hiroshi Nakamura",
year = "2012",
month = "10",
day = "22",
doi = "10.1145/2370816.2370833",
language = "English",
pages = "107--116",
journal = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",
issn = "1089-795X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Scalability-based manycore partitioning

AU - Sasaki, Hiroshi

AU - Koji, Inoue

AU - Tanimoto, Teruo

AU - Nakamura, Hiroshi

PY - 2012/10/22

Y1 - 2012/10/22

N2 - Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a num- ber of active processes in the computer system and the con- tinuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of paral- lel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum perfor- mance. The evaluation results on a 48-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

AB - Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a num- ber of active processes in the computer system and the con- tinuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of paral- lel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum perfor- mance. The evaluation results on a 48-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

UR - http://www.scopus.com/inward/record.url?scp=84867557523&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867557523&partnerID=8YFLogxK

U2 - 10.1145/2370816.2370833

DO - 10.1145/2370816.2370833

M3 - Conference article

AN - SCOPUS:84867557523

SP - 107

EP - 116

JO - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

JF - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SN - 1089-795X

ER -