GPUs can achieve high performance by exploiting massive-thread parallelism. However, some factors limit performance on GPUs, one of which is the negative effects of L1 cache misses. In some applications, GPUs are likely to suffer from L1 cache conflicts because a large number of cores share a small L1 cache capacity. A cache architecture that is based on data compression is a strong candidate for solving this problem as it can reduce the number of cache misses. Unlike previous studies, our data compression scheme attempts to exploit the value locality existing within not only intra cache lines but also inter cache lines. We enhance the structure of a last-level compression cache proposed for general purpose manycore processors to optimize against shared L1 caches on GPUs. The experimental results reveal that our proposal outperforms the other compression cache for GPUs by 11 points on average.
|Title of host publication||Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region|
|Publisher||Association for Computing Machinery (ACM)|
|Number of pages||10|
|Publication status||Published - Jan 2020|