Marvolo:用于实际ML-Driven Malware探测的方案数据增强 (Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection) - 专知论文

会员服务 ·

0

机器学习驱动的 · 数据增强 · Boosting（一种模型训练加速方式） · 模型评估 · 原点 ·

2022 年 6 月 7 日

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

翻译：Marvolo:用于实际ML-Driven Malware探测的方案数据增强

Michael D. Wong,Edward Raff,James Holt,Ravi Netravali

from arxiv, 15 pages, 7 figures

Data augmentation has been rare in the cyber security domain due to technical difficulties in altering data in a manner that is semantically consistent with the original data. This shortfall is particularly onerous given the unique difficulty of acquiring benign and malicious training data that runs into copyright restrictions, and that institutions like banks and governments receive targeted malware that will never exist in large quantities. We present MARVOLO, a binary mutator that programmatically grows malware (and benign) datasets in a manner that boosts the accuracy of ML-driven malware detectors. MARVOLO employs semantics-preserving code transformations that mimic the alterations that malware authors and defensive benign developers routinely make in practice , allowing us to generate meaningful augmented data. Crucially, semantics-preserving transformations also enable MARVOLO to safely propagate labels from original to newly-generated data samples without mandating expensive reverse engineering of binaries. Further, MARVOLO embeds several key optimizations that keep costs low for practitioners by maximizing the density of diverse data samples generated within a given time (or resource) budget. Experiments using wide-ranging commercial malware datasets and a recent ML-driven malware detector show that MARVOLO boosts accuracies by up to 5%, while operating on only a small fraction (15%) of the potential input binaries.

翻译：在网络安全领域,由于技术困难,难以以与原始数据一致的方式改变数据,数据增强在网络安全领域是少有的。这一不足特别繁重,因为获取无害和恶意培训数据有特殊困难,难以获得进入版权限制的良性和恶意培训数据,银行和政府等机构接收目标恶意软件,而这些软件将大量存在。我们介绍了MARVOLO,这是一个二进制变异器,其程序上培养恶意软件(和良性)数据集的方式提高了由ML驱动的恶意软件探测器的准确性。MARVOLO采用语义保存代码转换,以模拟恶意软件作者和防御性良性良性开发者在实践中经常作出的改变,从而使我们能够产生有意义的强化数据。关键是,语义保存变换也使MARVOLO能够安全地传播原版到新生成的数据样本的标签,而不需要花费昂贵的反向工程。此外,MARVOLO将一些关键优化,通过最大限度地增加在特定时间(或资源)预算内生成的多种数据样本的密度,从而保持成本低廉。使用宽度的商用恶意数据样本实验,而只是用MLOVOLO 进行微的磁盘操作,同时显示最近的磁盘的磁盘的磁盘。

0

相关内容

机器学习驱动的

机器学习驱动的

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

P-选择素糖蛋白配体1（PSGL-1）介导中性粒细胞募集在新生儿脓毒症中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

IFN-γ促进中华鳖虹彩病毒（STIV）感染细胞凋亡的抗病毒机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Ipr1基因介导巨噬细胞凋亡的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘黄龙病亚洲种病原( Cadidatus Liberibacter assiaticus)重组抗体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

对虾白斑综合症病毒（WSSV）蛋白对对虾免疫相关基因表达的调控

国家自然科学基金

0+阅读 · 2008年12月31日

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness

Arxiv

0+阅读 · 2022年7月25日

A Continuous-Time Perspective on Optimal Methods for Monotone Equation Problems

Arxiv

0+阅读 · 2022年7月24日

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Arxiv

0+阅读 · 2022年7月24日

Algorithmic Fairness in Business Analytics: Directions for Research and Practice

Arxiv

0+阅读 · 2022年7月22日

Bounding-box deep calibration for high performance face detection

Arxiv

0+阅读 · 2022年7月22日

Focused Decoding Enables 3D Anatomical Detection by Transformers

Arxiv

0+阅读 · 2022年7月21日

Auto Machine Learning for Medical Image Analysis by Unifying the Search on Data Augmentation and Neural Architecture

Arxiv

0+阅读 · 2022年7月21日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

机器学习驱动的

Boosting（一种模型训练加速方式）

相关VIP内容

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness

Arxiv

0+阅读 · 2022年7月25日

A Continuous-Time Perspective on Optimal Methods for Monotone Equation Problems

Arxiv

0+阅读 · 2022年7月24日

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Arxiv

0+阅读 · 2022年7月24日

Algorithmic Fairness in Business Analytics: Directions for Research and Practice

Arxiv

0+阅读 · 2022年7月22日

Bounding-box deep calibration for high performance face detection

Arxiv

0+阅读 · 2022年7月22日

Focused Decoding Enables 3D Anatomical Detection by Transformers

Arxiv

0+阅读 · 2022年7月21日

Auto Machine Learning for Medical Image Analysis by Unifying the Search on Data Augmentation and Neural Architecture

Arxiv

0+阅读 · 2022年7月21日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

相关基金

P-选择素糖蛋白配体1（PSGL-1）介导中性粒细胞募集在新生儿脓毒症中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

IFN-γ促进中华鳖虹彩病毒（STIV）感染细胞凋亡的抗病毒机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Ipr1基因介导巨噬细胞凋亡的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘黄龙病亚洲种病原( Cadidatus Liberibacter assiaticus)重组抗体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

对虾白斑综合症病毒（WSSV）蛋白对对虾免疫相关基因表达的调控

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员