计算机安全机器学习的多多多多多少 (Dos and Don'ts of Machine Learning in Computer Security)

With the growing processing power of computing systems and the increasing availability of massive datasets, machine learning algorithms have led to major breakthroughs in many different areas. This development has influenced computer security, spawning a series of work on learning-based security systems, such as for malware detection, vulnerability discovery, and binary code analysis. Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance and render learning-based systems potentially unsuitable for security tasks and practical deployment. In this paper, we look at this problem with critical eyes. First, we identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We conduct a study of 30 papers from top-tier security conferences within the past 10 years, confirming that these pitfalls are widespread in the current security literature. In an empirical analysis, we further demonstrate how individual pitfalls can lead to unrealistic performance and interpretations, obstructing the understanding of the security problem at hand. As a remedy, we propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible. Furthermore, we identify open problems when applying machine learning in security and provide directions for further research.

翻译：由于计算机系统的处理能力不断增强,而且大量数据集的可用性不断增加,机器学习算法在许多不同领域都取得了重大突破。这种发展影响到计算机安全,引发了一系列基于学习的安全系统工作,例如恶意软件检测、脆弱性发现和二元代码分析。尽管潜力巨大,但是,在安全方面的机器学习容易出现微妙的陷阱,损害其性能,并使学习系统可能不适合安全任务和实际部署。在本文件中,我们用批判的眼睛来看待这一问题。首先,我们找出了在设计、实施和评价学习安全系统方面的共同缺陷。我们研究了过去十年内最高安全会议30份文件,确认这些缺陷在目前的安全文献中十分普遍。在一项经验分析中,我们进一步证明个别的陷阱如何会导致不切实际的性能和解释,阻碍对手头的安全问题的理解。作为一种补救措施,我们提出了可采取行动的建议,支持研究人员避免或尽可能减轻陷阱。此外,我们在应用机器进行安全学习时发现一些未解决的问题,并为进一步的研究提供方向。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/