Virtusoso:通过视频提供情报,实时调控SOC (Virtuoso: Video-based Intelligence for real-time tuning on SOCs)

Efficient and adaptive computer vision systems have been proposed to make computer vision tasks, such as image classification and object detection, optimized for embedded or mobile devices. These solutions, quite recent in their origin, focus on optimizing the model (a deep neural network, DNN) or the system by designing an adaptive system with approximation knobs. In spite of several recent efforts, we show that existing solutions suffer from two major drawbacks. First, the system does not consider energy consumption of the models while making a decision on which model to run. Second, the evaluation does not consider the practical scenario of contention on the device, due to other co-resident workloads. In this work, we propose an efficient and adaptive video object detection system, Virtuoso, which is jointly optimized for accuracy, energy efficiency, and latency. Underlying Virtuoso is a multi-branch execution kernel that is capable of running at different operating points in the accuracy-energy-latency axes, and a lightweight runtime scheduler to select the best fit execution branch to satisfy the user requirement. To fairly compare with Virtuoso, we benchmark 15 state-of-the-art or widely used protocols, including Faster R-CNN (FRCNN), YOLO v3, SSD, EfficientDet, SELSA, MEGA, REPP, FastAdapt, and our in-house adaptive variants of FRCNN+, YOLO+, SSD+, and EfficientDet+ (our variants have enhanced efficiency for mobiles). With this comprehensive benchmark, Virtuoso has shown superiority to all the above protocols, leading the accuracy frontier at every efficiency level on NVIDIA Jetson mobile GPUs. Specifically, Virtuoso has achieved an accuracy of 63.9%, which is more than 10% higher than some of the popular object detection models, FRCNN at 51.1%, and YOLO at 49.5%.

翻译：高效且适应性的计算机视觉系统已被提出来进行计算机视觉任务,例如图像分类和对象检测,优化嵌入或移动设备。这些解决方案的起源是最近的,侧重于优化模型(深神经网络,DNN)或系统,设计一个具有近似电网的适应系统。尽管最近作出了一些努力,我们还是表明现有解决方案存在两大缺陷。首先,该系统在决定运行哪个模型时不考虑模型的能源消耗。第二,由于其他共同居民工作量,评价并不考虑设备争议的实际情景。在这项工作中,我们提出了高效且适应性视频物体检测系统,即Virtusoso(一个深神经网络网络网络网络,DNNNNNN)或系统,通过设计一个更精度、能效和耐久的适应性系统。尽管最近做出了一些努力,但根底的Virtusoro是一个多管执行核心。在精度-电源轴轴中,一个轻度运行的运行时间表,在选择最合适的执行分支(NRCVI+ER),在甚高的SHR-VIRA上,我们比SDO(SDO)在SD-tal-FIFIA,在SDA上比SG-FIL-VA,在SD-FIL-VD-VDR-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-VD-I-I-I-I-I-I-F-IP-FD-VD-IPD-VD-VD-VD-VD-VD-VD-I-I-I-I-I-I-I-I-VD-VD-VD-I-I-I-I-I-I-I-I-I-I-I-I-I-