Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

Hamid Noori, Farhad Mehdipour, Koji Inoue, Kazuaki Murakami

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32% and 3% compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56% and 22%, respectively. These performance and energy efficiency are obtained at the cost of 30% area overhead.

Original languageEnglish
Pages (from-to)196-222
Number of pages27
JournalJournal of Supercomputing
Volume60
Issue number2
DOIs
Publication statusPublished - May 1 2012

Fingerprint

Embedded Processor
Customization
Energy Efficiency
Energy efficiency
Fabrication
Unit
Energy conservation
Hardware
Speedup
Processing
Costs
Energy Saving
Subgraph
Partitioning
Chip
Benchmark
Engineering
Cycle
Resources
Experimental Results

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture

Cite this

Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization. / Noori, Hamid; Mehdipour, Farhad; Inoue, Koji; Murakami, Kazuaki.

In: Journal of Supercomputing, Vol. 60, No. 2, 01.05.2012, p. 196-222.

Research output: Contribution to journalArticle

@article{a39af4e1cc1d40e497418a542ac28d90,
title = "Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization",
abstract = "Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32{\%} and 3{\%} compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56{\%} and 22{\%}, respectively. These performance and energy efficiency are obtained at the cost of 30{\%} area overhead.",
author = "Hamid Noori and Farhad Mehdipour and Koji Inoue and Kazuaki Murakami",
year = "2012",
month = "5",
day = "1",
doi = "10.1007/s11227-010-0505-0",
language = "English",
volume = "60",
pages = "196--222",
journal = "Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

AU - Noori, Hamid

AU - Mehdipour, Farhad

AU - Inoue, Koji

AU - Murakami, Kazuaki

PY - 2012/5/1

Y1 - 2012/5/1

N2 - Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32% and 3% compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56% and 22%, respectively. These performance and energy efficiency are obtained at the cost of 30% area overhead.

AB - Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32% and 3% compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56% and 22%, respectively. These performance and energy efficiency are obtained at the cost of 30% area overhead.

UR - http://www.scopus.com/inward/record.url?scp=84861932346&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861932346&partnerID=8YFLogxK

U2 - 10.1007/s11227-010-0505-0

DO - 10.1007/s11227-010-0505-0

M3 - Article

AN - SCOPUS:84861932346

VL - 60

SP - 196

EP - 222

JO - Journal of Supercomputing

JF - Journal of Supercomputing

SN - 0920-8542

IS - 2

ER -