The graphics processing unit (GPU) is extensively used in diverse domains, such as finance, machine learning, and image processing.
The GPU can be underutilized as multiple applications may not share the same GPU concurrently owing to a memory oversubscription issue.
For example, when applications that require fewer computational resources but a larger GPU memory are run instantaneously, the GPU memory may be insufficient; consequently, the number of GPU applications running simultaneously is restricted, decreasing GPU utilization. Further, it can even stop the execution of applications that are running on the GPU.
To this end, we propose FlexGPU, which schedules the kernels of the GPU applications that run on the same GPU according to their features. This framework 1) schedules the kernel at the launching time according to its features to improve GPU utilization and 2) temporarily checkpoints and restores non-dependent content in the GPU memory to/from the host memory, which avoids oversubscription of the GPU when out-of-memory failure occurs and allows more kernels to run concurrently on the GPU.
The experimental results show that compared to existing methods, our approach demonstrates a 7 times improvement in performance in terms of execution time and enables a 2.5 times increase in the concurrent execution of applications.