A visual homograph attack is a way that the attacker deceives the web users about which domain they are visiting by exploiting forged domains that look similar to the genuine domains. T. Thao et al. (IFIP SEC'19) proposed a homograph classification by applying conventional supervised learning algorithms on the features extracted from a single-character-based Structural Similarity Index (SSIM). This paper aims to improve the classification accuracy by combining their SSIM features with 199 features extracted from a N-gram model and applying advanced ensemble learning algorithms. The experimental result showed that our proposed method could enhance even 1.81% of accuracy and reduce 2.15% of false-positive rate. Furthermore, existing work applied machine learning on some features without being able to explain why applying it can improve the accuracy. Even though the accuracy could be improved, understanding the ground-truth is also crucial. Therefore, in this paper, we conducted an error empirical analysis and could obtain several findings behind our proposed approach.
翻译:直观的同系物攻击是一种方法,攻击者利用与真实域相似的伪造域名欺骗网络用户,从而欺骗他们所访问的域名。 T. Thao等人(IFIP SEC'19)建议对从单一字符基结构相似指数(SSIM)中提取的特征应用常规监督学习算法,以此进行同系物分类分类分类。本文的目的是通过将他们的SSIM特征与从N-gram模型中提取的199个特征结合起来,并应用先进的共同学习算法,提高分类准确率。实验结果表明,我们建议的方法甚至可以提高1.81%的准确率,降低2.15%的虚假阳性率。此外,现有工作在一些特征上应用了机器学习,却无法解释为何应用它可以提高准确性。尽管准确性可以提高,但了解地面的规律也至关重要。因此,在本文中,我们进行了一项错误经验分析,并可以在我们拟议方法背后获得若干结果。