Typo-squatting domains are a common cyber-attack technique. It involves utilising domain names, that exploit possible typographical errors of commonly visited domains, to carry out malicious activities such as phishing, malware installation, etc. Current approaches typically revolve around string comparison algorithms like the Demaru-Levenschtein Distance (DLD) algorithm. Such techniques do not take into account keyboard distance, which researchers find to have a strong correlation with typical typographical errors and are trying to take account of. In this paper, we present the TypoSwype framework which converts strings to images that take into account keyboard location innately. We also show how modern state of the art image recognition techniques involving Convolutional Neural Networks, trained via either Triplet Loss or NT-Xent Loss, can be applied to learn a mapping to a lower dimensional space where distances correspond to image, and equivalently, textual similarity. Finally, we also demonstrate our method's ability to improve typo-squatting detection over the widely used DLD algorithm, while maintaining the classification accuracy as to which domain the input domain was attempting to typo-squat.
翻译:Typo 交叉域是一种常见的网络攻击技术。 它涉及使用域名, 利用常用域名中可能存在的印刷错误, 以开展像网钓、 恶意软件安装等恶意活动。 目前的方法通常围绕像Demaru- Levenschtein距离( DLD) 算法这样的字符串比较算法。 这种技术没有考虑到键盘距离, 研究人员发现它与典型的印刷错误有很强的关联, 并试图考虑到。 在本文中, 我们展示了将字符串转换为考虑到键盘位置的图像的 TypoSwype 框架。 我们还展示了涉及Contravelal Neal网络的艺术图像识别技术的现代状态, 通过Triplet Lossork 或NT- Xent Loss 等方法培训, 能够被应用到一个与图像相匹配的更低维度空间的绘图。 最后, 我们还展示了我们的方法, 来改进对广泛使用的 DLD 算法进行分辨的图像检测的能力, 同时保持域域域域输入的精确度, 试图向哪个域域域域域域的分类。