HPC Resource Utilization

High performance computing (HPC) is becoming a promising technology due to its high computing power and efficiency. As the amount of data needs to be processed increases, Big Data and Machine learning applications are utilizing HPC system for fast execution time. In HPC system, distributed file system such as lustre and ceph are being used to support high I/O bandwidth requirements. These file systems allow parallel access to the storage device.

However, due to the complexities of these file systems, many users are not getting the full performance of underlying storage device. As many users in HPC systems have diverse background such as physics and biology, it is crucial to develop a strategy which helps users to exploit the performance of high performance storage device without expert knowledge.

In previous studies, we have targeted various parallel file system and their configuration parameters. The studies showed that many users (more than 95%) are not utilizing parallel file system configurations that can improve the performance of application dramatically. Thus, we proposed a scheme that can autonomously detect a optimal configuration without any user intervention. This allows a vast improvement of performance without any new hardware or complex knowledge of the entire HPC system and the application algorithm.