机器学习集装箱拆散和易受损 (Machine Learning Containers are Bloated and Vulnerable)

Today's software is bloated leading to significant resource wastage. This bloat is prevalent across the entire software stack, from the operating system, all the way to software backends, frontends, and web-pages. In this paper, we study how prevalent bloat is in machine learning containers. We develop MMLB, a framework to analyze bloat in machine learning containers, measuring the amount of bloat that exists on the container and package levels. Our tool quantifies the sources of bloat and removes them. We integrate our tool with vulnerability analysis tools to measure how bloat affects container vulnerabilities. We experimentally study 15 machine learning containers from the official Tensorflow, Pytorch, and NVIDIA container registries under different tasks, (i.e., training, tuning, and serving). Our findings show that machine learning containers contain bloat encompassing up to 80\% of the container size. We find that debloating machine learning containers speeds provisioning times by up to $3.7\times$ and removes up to 98\% of all vulnerabilities detected by vulnerability analysis tools such as Grype. Finally, we relate our results to the larger discussion about technical debt in machine learning systems.

翻译：今天的软件膨胀,导致大量资源浪费。从操作系统到软件后端、前端和网页,整个软件堆中都普遍存在这种浮肿。在本文件中,我们研究了机器学习容器中普遍存在的浮肿情况。我们开发了MMLB, 用于分析机器学习容器中的浮肿情况, 测量容器和包件水平上存在的浮肿数量。我们的工具量化了浮肿来源并删除了这些来源。我们把工具与脆弱性分析工具结合起来,以测量浮肿对集装箱脆弱性的影响。我们实验研究15个机器从正式的Tensorflow、Pytorch和NVIDIA集装箱登记册中学习的集装箱集装箱,这是在不同任务下进行的(例如培训、调试和服务 ) 。我们的研究结果显示,机器学习容器中含有浮肿,覆盖了80 ⁇ 的集装箱数量。我们发现,机器学习容器的速度加快了3.7美元的时间,并消除了98美元的所有脆弱性讨论。我们通过脆弱性分析工具,例如Gpeepe 学习了我们的债务分析结果。

相关内容

Machine Learning

关注 2244

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日