http://swrc.ontoware.org/ontology#TechnicalReport
Breaking the limitation of GPU Memory for Deep Learning Workloads
en
機械学習
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science
AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science
RIKEN Center for Computational Science／Tokyo Institute of Technology, Dept. of Mathematical and Computing Science
Haoyu Zhang
Mohamed Wahib
Lingqi Zhang
Yohei Tsuji
Satoshi Mtsuoka
GPU memory can be insufficient for Deep Learning workloads with respect to the model and dataset sizes. Although model parallelism could help, significant modification of the code is needed for every case. An alternative general solution to this problem is to use out-of-core methods. Recent work proposed data-swapping and CUDA Unified Memory (UM) methods to break the limitation of GPU memory capacity. However, there is a lack of detailed analysis, via performance modeling, of the behavior and limitations of those methods. In this paper we analyze the behavior in terms of both single layer and the whole model. as well as propose a performance model based on the analysis to study how out-of-core training behaves and hence empower the co-design process for Deep Learning workloads.
GPU memory can be insufficient for Deep Learning workloads with respect to the model and dataset sizes. Although model parallelism could help, significant modification of the code is needed for every case. An alternative general solution to this problem is to use out-of-core methods. Recent work proposed data-swapping and CUDA Unified Memory (UM) methods to break the limitation of GPU memory capacity. However, there is a lack of detailed analysis, via performance modeling, of the behavior and limitations of those methods. In this paper we analyze the behavior in terms of both single layer and the whole model. as well as propose a performance model based on the analysis to study how out-of-core training behaves and hence empower the co-design process for Deep Learning workloads.
AN10463942
研究報告ハイパフォーマンスコンピューティング（HPC）
2019-HPC-170
10
1-7
2019-07-17
2188-8841