Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from the operating system, all the way to software backends, frontends, and web-pages. In this paper, we focus on analyzing and quantifying bloat in machine learning containers. We develop MMLB, a framework to analyze bloat in machine learning containers, measuring the amount of bloat that exists on the container and package levels. Our tool quantifies the sources of bloat and integrates with vulnerability analysis tools to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from Tensorflow, Pytorch, and NVIDIA, we show that bloat is a significant issue, accounting for up to 80% of the container size in some cases. Our results demonstrate that bloat significantly increases the container provisioning time by up to 370% and exacerbates vulnerabilities by up to 99%.
翻译:暂无翻译