TY - GEN
T1 - Enhancing a manycore-oriented compressed cache for GPGPU
AU - Oka, Keitaro
AU - Kawakami, Satoshi
AU - Tanimoto, Teruo
AU - Ono, Takatsugu
AU - Inoue, Koji
N1 - Funding Information:
This work was partly supported by JSPS KAKENHI Grant Number JP19H01105 and JST-Mirai Program Grant Number JP18077278.
Publisher Copyright:
© 2020 ACM.
PY - 2020/1/15
Y1 - 2020/1/15
N2 - GPUS can achieve high performance by exploiting massive-thread parallelism. However, some factors limit performance on GPUS, one of which is the negative effects of L1 cache misses. In some applications, GPUS are likely to suffer from L1 cache conflicts because a large number of cores share a small L1 cache capacity. A cache architecture that is based on data compression is a strong candidate for solving this problem as it can reduce the number of cache misses. Unlike previous studies, our data compression scheme attempts to exploit the value locality existing within not only intra cache lines but also inter cache lines. We enhance the structure of a last-level compression cache proposed for general purpose manycore processors to optimize against shared L1 caches on GPUS. The experimental results reveal that our proposal outperforms the other compression cache for GPUS by 11 points on average.
AB - GPUS can achieve high performance by exploiting massive-thread parallelism. However, some factors limit performance on GPUS, one of which is the negative effects of L1 cache misses. In some applications, GPUS are likely to suffer from L1 cache conflicts because a large number of cores share a small L1 cache capacity. A cache architecture that is based on data compression is a strong candidate for solving this problem as it can reduce the number of cache misses. Unlike previous studies, our data compression scheme attempts to exploit the value locality existing within not only intra cache lines but also inter cache lines. We enhance the structure of a last-level compression cache proposed for general purpose manycore processors to optimize against shared L1 caches on GPUS. The experimental results reveal that our proposal outperforms the other compression cache for GPUS by 11 points on average.
UR - http://www.scopus.com/inward/record.url?scp=85094830028&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094830028&partnerID=8YFLogxK
U2 - 10.1145/3368474.3368491
DO - 10.1145/3368474.3368491
M3 - Conference contribution
AN - SCOPUS:85094830028
T3 - ACM International Conference Proceeding Series
SP - 22
EP - 31
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2020
PB - Association for Computing Machinery
T2 - 2020 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2020
Y2 - 15 January 2020 through 17 January 2020
ER -