Eudoxus: 自动机的特性和加速本地化 (Eudoxus: Characterizing and Accelerating Localization in Autonomous Machines)

We develop and commercialize autonomous machines, such as logistic robots and self-driving cars, around the globe. A critical challenge to our -- and any -- autonomous machine is accurate and efficient localization under resource constraints, which has fueled specialized localization accelerators recently. Prior acceleration efforts are point solutions in that they each specialize for a specific localization algorithm. In real-world commercial deployments, however, autonomous machines routinely operate under different environments and no single localization algorithm fits all the environments. Simply stacking together point solutions not only leads to cost and power budget overrun, but also results in an overly complicated software stack. This paper demonstrates our new software-hardware co-designed framework for autonomous machine localization, which adapts to different operating scenarios by fusing fundamental algorithmic primitives. Through characterizing the software framework, we identify ideal acceleration candidates that contribute significantly to the end-to-end latency and/or latency variation. We show how to co-design a hardware accelerator to systematically exploit the parallelisms, locality, and common building blocks inherent in the localization framework. We build, deploy, and evaluate an FPGA prototype on our next-generation self-driving cars. To demonstrate the flexibility of our framework, we also instantiate another FPGA prototype targeting drones, which represent mobile autonomous machines. We achieve about 2x speedup and 4x energy reduction compared to widely-deployed, optimized implementations on general-purpose platforms.

翻译：我们在全球各地开发和商业化自主机器,例如后勤机器人和自行驾驶汽车。我们面临的一个重大挑战是,在资源有限的情况下,自主机器是准确和高效的本地化,这最近刺激了专门的本地化加速器。先前的加速化努力是点解决办法,因为它们各自专门用于特定的本地化算法。然而,在现实世界的商业部署中,自主机器经常在不同的环境中运行,没有单一的本地化算法适合所有环境。只是堆叠点解决方案不仅导致成本和电力预算平台超支,而且导致软件堆过于复杂。本文展示了我们新的软件硬件共同设计的自动机器本地化框架,通过使用基本的本地化原始法来适应不同的操作情景。通过软件框架的特性,我们确定了理想的加速候选人,这极大地促进了终端到终端的延迟和(或)定位变异。我们展示了如何共同设计一个硬件加速器,以便系统地利用本地化框架的平行、地点和共同建筑堆叠。我们通过建立、部署、评估2个通用的自动智能智能计算机模型来适应不同的操作方案。我们要在本地框架上建立、部署、部署、部署、评价另一个自我定位模型,并评估另一个自我定位。