This is a project report about how we tune Focus[1], a video inference system that provides low cost and low latency, through two phases. In this report, we will decrease the query time by saving the middle layer output of the neural network. This is a trade-off strategy that involves using more space to save time. We show how this scheme works using prototype systems, and it saves 20% of the time. The code repository URL is here, https://github.com/iphyer/CS744 FocousIngestOpt.
翻译:这是一份关于我们如何通过两个阶段调频Focus[1] 的项目报告,这是一个提供低成本和低延缓度的视频推断系统。在本报告中,我们将通过节省神经网络的中层输出来减少查询时间。这是一个取舍战略,需要利用更多空间来节省时间。我们用原型系统来显示这个方案是如何运作的,它节省了20%的时间。代码存储器 URL 已经在这里, https://github.com/iphyer/CS744FocousIngestOpt。