We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from auto-batching and per-example gradients, to jacobian computation, optimized map functions and input pipeline optimization. We report huge speedups compared to both loop based implementations, as well as run-time batching adopted by the DyNet framework.
翻译:我们建议在TensorFlow等框架所使用的高数据流 IR 之上进行静态循环矢量优化。 在 TensorFlow 顶端提供了一个新的静态矢量平行抽象化,并用于从自动对接和每例梯度到 Jacobian 计算、优化地图功能和输入管道优化等应用。 我们报告与环流实施以及DyNet 框架采用的运行时批量相比超速。