Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences. It requires not only sophisticated deep neural network (DNN) based codec avatar decoders to ensure high visual quality and precise motion expression, but also efficient hardware accelerators to guarantee smooth real-time rendering using lightweight edge devices, like untethered VR headsets. Existing hardware accelerators, however, fail to deliver sufficient performance and efficiency targeting such decoders which consist of multi-branch DNNs and require demanding compute and memory resources. To address these problems, we propose an automation framework, called F-CAD (Facebook Codec avatar Accelerator Design), to explore and deliver optimized hardware accelerators for codec avatar decoding. Novel technologies include 1) a new accelerator architecture to efficiently handle multi-branch DNNs; 2) a multi-branch dynamic design space to enable fine-grained architecture configurations; and 3) an efficient architecture search for picking the optimized hardware design based on both application-specific demands and hardware resource constraints. To the best of our knowledge, F-CAD is the first automation tool that supports the whole design flow of hardware acceleration of codec avatar decoders, allowing joint optimization on decoder designs in popular machine learning frameworks and corresponding customized accelerator design with cycle-accurate evaluation. Results show that the accelerators generated by F-CAD can deliver up to 122.1 frames per second (FPS) and 91.6% hardware efficiency when running the latest codec avatar decoder. Compared to the state-of-the-art designs, F-CAD achieves 4.0X and 2.8X higher throughput, 62.5% and 21.2% higher efficiency than DNNBuilder and HybridDNN by targeting the same hardware device.
翻译:创建具有现实效果的虚拟变异器是最重要的和最具挑战性的任务之一。 它不仅需要精密的深层神经网络(DNN)基于codc avatar 解码器以确保高视觉质量和精确运动表达式, 还需要高效的硬件加速器来保证使用轻度边缘设备(如未节奏的VR头饰)进行平稳实时转换。 但是,现有的硬件加速器无法提供足够高的性能和效率, 以这些解码器为目标, 这些解码器由多分支 DNNP组成, 需要高要求的编译和记忆资源。 为了解决这些问题, 我们提议了一个自动化框架, 叫做FC( Facebook Codeder avader acational), 探索并提供最优化的硬件加速器, 像不动的Vatartreator Dalder de daddaddoration。 诺尔技术包括1) 一个新的加速器结构, 以高效的解码器加速式结构设计空间, 以精细的配置为基础, 节制的硬化的硬化的硬件设计框架, 运行的FC daldealdeal deadd dad dead dad dad dadd dadd dadd dadd dadd dal dede dead dede dede dede deaddal dede dede dede dede dede dead dad dal deal deal dede dede dede dex a 。