The goal of Domain Generation Algorithm (DGA) detection is to recognize infections with bot malware and is often done with help of Machine Learning approaches that classify non-resolving Domain Name System (DNS) traffic and are trained on possibly sensitive data. In parallel, the rise of privacy research in the Machine Learning world leads to privacy-preserving measures that are tightly coupled with a deep learning model's architecture or training routine, while non deep learning approaches are commonly better suited for the application of privacy-enhancing methods outside the actual classification module. In this work, we aim to measure the privacy capability of the feature extractor of feature-based DGA detector FANCI (Feature-based Automated Nxdomain Classification and Intelligence). Our goal is to assess whether a data-rich adversary can learn an inverse mapping of FANCI's feature extractor and thereby reconstruct domain names from feature vectors. Attack success would pose a privacy threat to sharing FANCI's feature representation, while the opposite would enable this representation to be shared without privacy concerns. Using three real-world data sets, we train a recurrent Machine Learning model on the reconstruction task. Our approaches result in poor reconstruction performance and we attempt to back our findings with a mathematical review of the feature extraction process. We thus reckon that sharing FANCI's feature representation does not constitute a considerable privacy leakage.
翻译:在这项工作中,我们的目标是衡量基于地貌特征的DGA探测器(基于功能的自动Nxdomain分类和智能)的特征提取器的隐私能力。 我们的目标是评估一个数据丰富的对手能否从对科特迪瓦国民军特征提取器进行反向绘图中学习,从而将域名从地貌矢量中重建出来。 攻击成功会对共享科特迪瓦国民军的特征代表性造成隐私威胁,而与此相反的则是使这一代表性得以在没有隐私关切的情况下共享。我们利用三个真实世界数据集,就重建任务对一个经常性机器学习模型进行了培训。我们用一个数学特征来评估,从而形成了一个数学特征的复苏。