Catching and attributing code change-induced performance regressions in production is hard; predicting them beforehand, even harder. A primer on automatically learning to predict performance regressions in software, this article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta. In this paper, we report on a comparative study with four ML models of increasing complexity, from (1) code-opaque, over (2) Bag of Words, (3) off-the-shelve Transformer-based, to (4) a bespoke Transformer-based model, coined SuperPerforator. Our investigation shows the inherent difficulty of the performance prediction problem, which is characterized by a large imbalance of benign onto regressing changes. Our results also call into question the general applicability of Transformer-based architectures for performance prediction: an off-the-shelve CodeBERT-based approach had surprisingly poor performance; our highly customized SuperPerforator architecture initially achieved prediction performance that was just on par with simpler Bag of Words models, and only outperformed them for down-stream use cases. This ability of SuperPerforator to transfer to an application with few learning examples afforded an opportunity to deploy it in practice at Meta: it can act as a pre-filter to sort out changes that are unlikely to introduce a regression, truncating the space of changes to search a regression in by up to 43%, a 45x improvement over a random baseline. To gain further insight into SuperPerforator, we explored it via a series of experiments computing counterfactual explanations. These highlight which parts of a code change the model deems important, thereby validating the learned black-box model.
翻译:在生产过程中, 捕获和分配代码改变导致的性能回归很难; 提前预测, 甚至更难。 在自动学习以预测软件的性能回归时, 本文章描述了我们在Meta 研究和部署基于 ML 的回归预测管道时获得的经验。 在本文中, 我们报告与四个复杂程度不断提高的 ML 模型的比较研究, 从(1) 代码- opaque, 超过(2) 调制文本包, (3) 以隐蔽的变换器为基础, 到(4) 以隐形的变换器为基础的随机变换器模型, 以创建超级Peroperfora为根据, 更难。 我们的调查显示, 业绩预测问题的内在困难在于, 其特点是良良性对回退变化的不平衡。 我们的结果也使人们质疑基于变换结构的一般适用性: 以隐蔽的 codeBETERT 方法表现了令人惊讶的表现; 我们高度定制的超Perforator 模型最初的预测性能表现, 与更简洁的LOWs 模型相比, 的变换的模型, 只能将其反向下流的精确变换为少数。