URLTran: 利用变换器改进捕捉 URLTran: 改进利用变换器探测 URL URL (URLTran: Improving Phishing URL Detection Using Transformers) - 专知论文

会员服务 ·

0

URL · Performer · 变换 · MoDELS · 掩码语言模型化 ·

2021 年 8 月 4 日

URLTran: Improving Phishing URL Detection Using Transformers

翻译：URLTran: 利用变换器改进捕捉 URLTran: 改进利用变换器探测 URL URL

Pranav Maneriker,Jack W. Stokes,Edir Garcia Lazo,Diana Carutasu,Farid Tajaddodianfar,Arun Gururajan

Browsers often include security features to detect phishing web pages. In the past, some browsers evaluated an unknown URL for inclusion in a list of known phishing pages. However, as the number of URLs and known phishing pages continued to increase at a rapid pace, browsers started to include one or more machine learning classifiers as part of their security services that aim to better protect end users from harm. While additional information could be used, browsers typically evaluate every unknown URL using some classifier in order to quickly detect these phishing pages. Early phishing detection used standard machine learning classifiers, but recent research has instead proposed the use of deep learning models for the phishing URL detection task. Concurrently, text embedding research using transformers has led to state-of-the-art results in many natural language processing tasks. In this work, we perform a comprehensive analysis of transformer models on the phishing URL detection task. We consider standard masked language model and additional domain-specific pre-training tasks, and compare these models to fine-tuned BERT and RoBERTa models. Combining the insights from these experiments, we propose URLTran which uses transformers to significantly improve the performance of phishing URL detection over a wide range of very low false positive rates (FPRs) compared to other deep learning-based methods. For example, URLTran yields a true positive rate (TPR) of 86.80% compared to 71.20% for the next best baseline at an FPR of 0.01%, resulting in a relative improvement of over 21.9%. Further, we consider some classical adversarial black-box phishing attacks such as those based on homoglyphs and compound word splits to improve the robustness of URLTran. We consider additional fine tuning with these adversarial samples and demonstrate that URLTran can maintain low FPRs under these scenarios.

翻译：浏览器通常包含一种或多种机器学习分类器, 作为安全服务的一部分, 目的是更好地保护终端用户免受伤害。虽然可以使用额外信息, 浏览器通常使用某些分类器来评估每个未知的 URL, 以便快速检测这些phishing 网页。一些浏览器评估了一个未知的 URL, 以便纳入已知的phish 页面列表。但是,由于URL 和已知的phishing 页面的数量继续快速增加, 浏览器开始将一个或多个机器学习分类器作为其安全服务的一部分, 目的是更好地保护终端用户免受伤害。虽然可以使用额外的信息, 浏览器通常使用某些分类器来评估每个未知的 URL 。早期pherm 检测器使用标准遮掩语言模型和额外的域端域际协议前任务, 并将这些模型与精确的 BERT 和 RoBERT 模型相比较, 提议使用最精确的 URLT 模型, 将这些正值的演示到最精确的URLA 。

0

相关内容

URL

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Mila】通用表示Transformer少样本图像分类

【Mila】通用表示Transformer少样本图像分类

专知会员服务

33+阅读 · 2020年9月7日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

专知会员服务

25+阅读 · 2019年11月16日

【O'Reilly AI Conference 2019】使用联合学习在智能建筑中进行异常检测（Anomaly detection in smart buildings using federated learning），Binaize Labs的联合创始人Tuhin Sharma、Binaize Labs的联合创始人和机器学习工程师Bargava Subramanian

【O'Reilly AI Conference 2019】使用联合学习在智能建筑中进行异常检测（Anomaly detection in smart buildings using federated learning），Binaize Labs的联合创始人Tuhin Sharma、Binaize Labs的联合创始人和机器学习工程师Bargava Subramanian

专知会员服务

6+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

极市平台

12+阅读 · 2018年8月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Mobile authentication of copy detection patterns: how critical is to know fakes?

Arxiv

0+阅读 · 2021年10月5日

Advances in Machine and Deep Learning for Modeling and Real-time Detection of Multi-Messenger Sources

Arxiv

0+阅读 · 2021年10月2日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Object detection at 200 Frames Per Second

Arxiv

5+阅读 · 2018年5月16日

End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching

Arxiv

8+阅读 · 2018年5月9日

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Arxiv

3+阅读 · 2018年4月18日

VIP会员

文章信息

相关主题

掩码语言模型化

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Mila】通用表示Transformer少样本图像分类

【Mila】通用表示Transformer少样本图像分类

专知会员服务

33+阅读 · 2020年9月7日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

专知会员服务

25+阅读 · 2019年11月16日

【O'Reilly AI Conference 2019】使用联合学习在智能建筑中进行异常检测（Anomaly detection in smart buildings using federated learning），Binaize Labs的联合创始人Tuhin Sharma、Binaize Labs的联合创始人和机器学习工程师Bargava Subramanian

【O'Reilly AI Conference 2019】使用联合学习在智能建筑中进行异常检测（Anomaly detection in smart buildings using federated learning），Binaize Labs的联合创始人Tuhin Sharma、Binaize Labs的联合创始人和机器学习工程师Bargava Subramanian

专知会员服务

6+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

极市平台

12+阅读 · 2018年8月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Mobile authentication of copy detection patterns: how critical is to know fakes?

Arxiv

0+阅读 · 2021年10月5日

Advances in Machine and Deep Learning for Modeling and Real-time Detection of Multi-Messenger Sources

Arxiv

0+阅读 · 2021年10月2日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Object detection at 200 Frames Per Second

Arxiv

5+阅读 · 2018年5月16日

End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching

Arxiv

8+阅读 · 2018年5月9日

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Arxiv

3+阅读 · 2018年4月18日

微信扫码咨询专知VIP会员