Domain generation algorithms (DGAs) prevent the connection between a botnet and its master from being blocked by generating a large number of domain names. Promising single-data-source approaches have been proposed for separating benign from DGA-generated domains. Collaborative machine learning (ML) can be used in order to enhance a classifier's detection rate, reduce its false positive rate (FPR), and to improve the classifier's generalization capability to different networks. In this paper, we complement the research area of DGA detection by conducting a comprehensive collaborative learning study, including a total of 13,440 evaluation runs. In two real-world scenarios we evaluate a total of eleven different variations of collaborative learning using three different state-of-the-art classifiers. We show that collaborative ML can lead to a reduction in FPR by up to 51.7%. However, while collaborative ML is beneficial for DGA detection, not all approaches and classifier types profit equally. We round up our comprehensive study with a thorough discussion of the privacy threats implicated by the different collaborative ML approaches.
翻译:磁盘生成算法(DGAs)防止了肉网与其主人之间的连接,防止它通过生成大量域名而被阻塞。为了将良性与DGA产生的域分离,提出了有希望的单一数据源方法。合作机器学习(ML)可以用来提高分类者的检测率,降低其假正率,提高分类者对不同网络的概括化能力。在本文中,我们通过开展全面合作学习研究来补充DGA探测的研究领域,包括总共13,440项评估。在两种现实世界情景中,我们用三种不同的最新分类方法来评估总共11种不同的协作学习变异。我们表明,合作ML可以导致FPR减少高达51.7%。然而,虽然合作ML有利于DGA的检测,而不是所有方法和分类者类型的利润平等。我们通过对不同协作ML方法涉及的隐私威胁进行彻底讨论来补充我们的全面研究。