High performance visualization has played an important role in computer-aided scientific discovery and has become an indispensable tool for computational scientists. Sort-last parallel rendering is a proven approach for visual data analytics by extracting meaningful information from huge data sets generated from large scale scientific computing. Image compositing is the last stage of sort-last parallel rendering pipeline and works by combining the images generated by the rendering nodes to generate the final image. Since it requires interprocess communication among the entire nodes, it usually dominates the total cost of the parallel rendering process. In current high-end massively parallel HPC systems, where tens or even hundreds of thousands of nodes can be involved, performance degradation is inevitable even using theoretically scalable image compositing algortithms such as the well-known Binary-Swap method. To minimize this undesirable performance degradation, we propose the multi-step image compositing method, where the image compositing nodes are divided into smaller groups and the entire process is performed in several steps. We evaluated the proposed image compositing method on RIKEN K computer, which is a massively parallel HPC system, and we obtained encouraging results showing the effectiveness of this method in a large-scale image compositing environment. We also foresee a great potential of this method to meet the large-scale image compositing demands brought about by the rapid increase in processor counts of current and next-generation HPC systems.