Machine Learning has been successfully applied in systems applications such as memory prefetching and caching, where learned models have been shown to outperform heuristics. However, the lack of understanding the inner workings of these models -- interpretability -- remains a major obstacle for adoption in real-world deployments. Understanding a model's behavior can help system administrators and developers gain confidence in the model, understand risks, and debug unexpected behavior in production. Interpretability for models used in computer systems poses a particular challenge: Unlike ML models trained on images or text, the input domain (e.g., memory access patterns, program counters) is not immediately interpretable. A major challenge is therefore to explain the model in terms of concepts that are approachable to a human practitioner. By analyzing a state-of-the-art caching model, we provide evidence that the model has learned concepts beyond simple statistics that can be leveraged for explanations. Our work provides a first step towards explanability of system ML models and highlights both promises and challenges of this emerging research area.
翻译:机械学习被成功地应用于系统应用,如记忆预发和缓存等系统应用中,学习模型被证明优于工作形态。然而,缺乏对这些模型的内部运作 -- -- 可解释性 -- -- 的了解仍然是现实世界部署中采用的主要障碍。了解模型的行为有助于系统管理员和开发者对模型产生信心,了解风险,并调试生产过程中的意外行为。计算机系统中使用模型的可解释性是一个特殊的挑战:与在图像或文本方面受过培训的ML模型不同,输入领域(如记忆存取模式、程序对立)不能立即解释。因此,一项重大挑战是解释模型中对人类从业人员可采用的概念。通过分析最新学得的缓冲模型,我们提供了证据,证明模型所学的概念超越了可用于解释的简单统计数据。我们的工作为系统ML模型的可解释性提供了第一步,并突显了这一新兴研究领域的许诺和挑战。