Most research on novel techniques for 3D Medical Image Segmentation (MIS) is currently done using Deep Learning with GPU accelerators. The principal challenge of such technique is that a single input can easily cope computing resources, and require prohibitive amounts of time to be processed. Distribution of deep learning and scalability over computing devices is an actual need for progressing on such research field. Conventional distribution of neural networks consist in data parallelism, where data is scattered over resources (e.g., GPUs) to parallelize the training of the model. However, experiment parallelism is also an option, where different training processes are parallelized across resources. While the first option is much more common on 3D image segmentation, the second provides a pipeline design with less dependence among parallelized processes, allowing overhead reduction and more potential scalability. In this work we present a design for distributed deep learning training pipelines, focusing on multi-node and multi-GPU environments, where the two different distribution approaches are deployed and benchmarked. We take as proof of concept the 3D U-Net architecture, using the MSD Brain Tumor Segmentation dataset, a state-of-art problem in medical image segmentation with high computing and space requirements. Using the BSC MareNostrum supercomputer as benchmarking environment, we use TensorFlow and Ray as neural network training and experiment distribution platforms. We evaluate the experiment speed-up, showing the potential for scaling out on GPUs and nodes. Also comparing the different parallelism techniques, showing how experiment distribution leverages better such resources through scaling. Finally, we provide the implementation of the design open to the community, and the non-trivial steps and methodology for adapting and deploying a MIS case as the here presented.
翻译:有关3D 医疗图像路段(MIS) 的多数新技术研究目前都是使用与 GPU 加速器的深度学习进行。 这种技术的主要挑战在于, 单项输入可以很容易地处理计算资源, 并且需要大量时间才能处理。 计算设备的深度学习和缩放分布是这种研究领域取得进展的实际需要。 神经网络的常规分布是数据平行的, 数据分散于资源( 如 GPU), 以平行地进行模型培训。 然而, 实验平行主义也是一个选项, 不同的培训过程在资源之间平行进行。 第一个选项在 3D 图像路段上更为常见, 而第二个选项则提供管道设计, 减少平行的流程的依赖性, 允许减少间接费用, 并增加可能的缩放。 在这项工作中, 以多节点和多PUPU 环境为重点, 两种不同的分配方法在这里部署和设定基准。 我们通过3D UNet 结构的缩放证明, 使用 MSD 大脑平流流分配技术, 以高空心路路路段, 显示高空心路路路路段, 和高空心路路路段 数据路段, 显示我们使用高空路路段 测试 测试 测试 测试,, 测试 测试, 测试 测试 测试 测试 测试 测试 测试 测试 系统 测试 测试 测试 测试,, 系统 测试 系统 系统,, 系统 系统 系统 和 测试 测试 系统 系统 测试 测试 测试 测试 测试,,, 系统 测试 测试 系统 测试 测试 测试 测试 测试 测试 系统 测试 测试 测试,, 测试,,,,,,,,, 系统,,,,,,,,, 和 和 测试 测试 测试 测试 测试 测试, 运行 运行 运行 运行 运行 运行 运行,, 运行 系统 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行 运行