主题： Deep Learning Compiler
Apache TVM是一个用于Cpu、Gpu和专用加速器的开源深度学习编译器堆栈。它的目标是缩小以生产力为中心的深度学习框架和以性能或效率为中心的硬件后端之间的差距。在此次演讲中主要围绕AWS AI的深度学习编译器的项目展开，讲述了如何通过TVM使用预量化模型，完全从零开始添加新的操作或者是降低到现有继电器操作符的序列。
Yida Wang是亚马逊AWS AI团队的一名应用科学家。在加入Amazon之前，曾在Intel实验室的并行计算实验室担任研究科学家。Yida Wang在普林斯顿大学获得了计算机科学和神经科学博士学位。研究兴趣是高性能计算和大数据分析。目前的工作是优化深度学习模型对不同硬件架构的推理，例如Cpu, Gpu, TPUs。
Recently, there has been growing interest in using standard language constructs (e.g. C++'s Parallel Algorithms and Fortran's do concurrent) for accelerated computing as an alternative to directive-based APIs (e.g. OpenMP and OpenACC). These constructs have the potential to be more portable, and some compilers already (or have plans to) support such standards. Here, we look at the current capabilities, portability, and performance of replacing directives with Fortran's do concurrent using a mini-app that currently implements OpenACC for GPU-acceleration and OpenMP for multi-core CPU parallelism. We replace as many directives as possible with do concurrent, testing various configurations and compiler options within three major compilers: GNU's gfortran, NVIDIA's nvfortran, and Intel's ifort. We find that with the right compiler versions and flags, many directives can be replaced without loss of performance or portability, and, in the case of nvfortran, they can all be replaced. We discuss limitations that may apply to more complicated codes and future language additions that may mitigate them. The software and Singularity containers are publicly provided to allow the results to be reproduced.