Logging is a development practice that plays an important role in the operations and monitoring of complex systems. Developers place log statements in the source code and use log data to understand how the system behaves in production. Unfortunately, anticipating where to log during development is challenging. Previous studies show the feasibility of leveraging machine learning to recommend log placement despite the data imbalance since logging is a fraction of the overall code base. However, it remains unknown how those techniques apply to an industry setting, and little is known about the effect of imbalanced data and sampling techniques. In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company. We analyze 34,526 Java files and 309,527 methods that sum up +2M SLOC. We systematically measure the effectiveness of five models based on code metrics, explore the effect of sampling techniques, understand which features models consider to be relevant for the prediction, and evaluate whether we can exploit 388,086 methods from 29 Apache projects to learn where to log in an industry setting. Our best performing model achieves 79% of balanced accuracy, 81% of precision, 60% of recall. While sampling techniques improve recall, they penalize precision at a prohibitive cost. Experiments with open-source data yield under-performing models over Adyen's test set; nevertheless, they are useful due to their low rate of false positives. Our supporting scripts and tools are available to the community.
翻译:在复杂系统的操作和监测中,记录是一种开发做法,在复杂的系统的运作和监测中起着重要作用。 开发者在源代码中设置日志报表, 并使用日志数据来理解系统在生产过程中的运行方式。 不幸的是, 开发过程中预计到哪里是具有挑战性的 。 先前的研究显示, 利用机器学习来建议日志设置的可行性, 尽管数据不平衡, 伐木是整个代码基础的一部分。 但是, 这些技术如何适用于行业环境, 并且对不平衡的数据和取样技术的影响知之甚少。 在本文中, 我们研究了Adyen 的代码库中的日志设置问题, 是一个大型支付公司。 我们分析了34,526 Java 文档和 309,527 方法, 总结了 +2M SLOC 。 我们系统地测量了基于代码参数的五种模型的有效性, 探索了采样技术的效果, 了解哪些模型被认为与预测相关, 并且评估我们是否可以利用29 Apache项目中的388,086 方法来学习如何在行业环境中进行登录。 我们最好的执行模型的精确度达到79%, 精确度为81%,, 精确度, 精确度为60 % 的精确度,, 的精确度回回溯 。