使用编码样式的法证 (Robust PDF Files Forensics Using Coding Style) - 专知论文

会员服务 ·

0

可辨认的 · 稳健性 · INFORMS · tuning · 可理解性 ·

2021 年 3 月 3 日

Robust PDF Files Forensics Using Coding Style

翻译：使用编码样式的法证

Supriya Adhatarao,Cédric Lauradoux

Identifying how a file has been created is often interesting in security. It can be used by both attackers and defenders. Attackers can exploit this information to tune their attacks and defenders can understand how a malicious file has been created after an incident. In this work, we want to identify how a PDF file has been created. This problem is important because PDF files are extremely popular: many organizations publish PDF files online and malicious PDF files are commonly used by attackers. Our approach to detect which software has been used to produce a PDF file is based on coding style: given patterns that are only created by certain PDF producers. We have analyzed the coding style of 900 PDF files produced using 11 PDF producers on 3 different Operating Systems. We have obtained a set of 192 rules which can be used to identify 11 PDF producers. We have tested our detection tool on 508836 PDF files published on scientific preprints servers. Our tool is able to detect certain producers with an accuracy of 100%. Its overall detection is still high (74%). We were able to apply our tool to identify how online PDF services work and to spot inconsistency.

翻译：如何创建文件通常在安全方面很有意思。攻击者和捍卫者都可以使用它来调和攻击者和捍卫者。攻击者可以利用这个信息来调和攻击者的攻击, 捍卫者可以理解事件发生后如何创建恶意文件。在此工作中, 我们想要确定一个PDF文件是如何创建的。这个问题很重要, 因为PDF文件非常受欢迎: 许多组织在网上公布PDF文件, 攻击者通常使用恶意的PDF文件。我们检测哪些软件用于生成PDF文件的方法基于编码样式: 某些PDF生产商只能创建的某种模式。我们分析了在3个不同的操作系统中使用11个PDF生产者制作的900 PDF文件的编码风格。我们获得了一套192条规则, 可以用来识别11个PDFD的生产者。我们已经测试了我们在科学预印服务器上公布的508836 PDF文件的检测工具。我们的工具能够以100%的准确度探测某些生产者。它的总体检测率仍然很高( 74% )。我们能够应用我们的工具来识别网络上的PDFS服务工作方式。

0

相关内容

可辨认的

【2021新书】编码艺术，Coding Art，284页pdf

【2021新书】编码艺术，Coding Art，284页pdf

专知会员服务

77+阅读 · 2021年1月10日

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【实用书】Python编程，140页pdf

【实用书】Python编程，140页pdf

专知会员服务

43+阅读 · 2020年8月20日

【实用书】掌握Python数据分析，282页pdf，Mastering Python Data Analysis

【实用书】掌握Python数据分析，282页pdf，Mastering Python Data Analysis

专知会员服务

103+阅读 · 2020年4月22日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

Github项目推荐 | pikepdf - Python的PDF读写库

Github项目推荐 | pikepdf - Python的PDF读写库

AI研习社

9+阅读 · 2019年3月29日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【专知荟萃15】图像检索Image Retrieval知识资料全集（入门/进阶/综述/视频/代码/专家，附PDF下载）

【专知荟萃15】图像检索Image Retrieval知识资料全集（入门/进阶/综述/视频/代码/专家，附PDF下载）

专知

65+阅读 · 2017年11月14日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

Deep Two-Stage High-Resolution Image Inpainting

Arxiv

0+阅读 · 2021年4月27日

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Arxiv

0+阅读 · 2021年4月27日

Quantifying Privacy Leakage in Graph Embedding

Arxiv

0+阅读 · 2021年4月26日

ReGiS: Regular Expression Simplification via Rewrite-Guided Synthesis

Arxiv

0+阅读 · 2021年4月24日

Scalable Microservice Forensics and Stability Assessment Using Variational Autoencoders

Arxiv

0+阅读 · 2021年4月23日

Deep Learning for Digital Text Analytics: Sentiment Analysis

Arxiv

4+阅读 · 2018年4月10日

Camera Style Adaptation for Person Re-identification

Arxiv

3+阅读 · 2018年4月10日

Agile Amulet: Real-Time Salient Object Detection with Contextual Attention

Arxiv

5+阅读 · 2018年2月20日

Saliency-Enhanced Robust Visual Tracking

Arxiv

6+阅读 · 2018年2月8日

LA-LDA: A Limited Attention Topic Model for Social Recommendation

Arxiv

3+阅读 · 2013年1月26日

VIP会员

文章信息

相关主题

相关VIP内容

【2021新书】编码艺术，Coding Art，284页pdf

【2021新书】编码艺术，Coding Art，284页pdf

专知会员服务

77+阅读 · 2021年1月10日

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【实用书】Python编程，140页pdf

【实用书】Python编程，140页pdf

专知会员服务

43+阅读 · 2020年8月20日

【实用书】掌握Python数据分析，282页pdf，Mastering Python Data Analysis

【实用书】掌握Python数据分析，282页pdf，Mastering Python Data Analysis

专知会员服务

103+阅读 · 2020年4月22日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】反事实推理在多模态对话生成中的应用

基于强化学习的智能体化搜索全面综述：基础、角色、优化、评估与应用

ICCV最佳论文出炉，朱俊彦团队用砖块积木摘得桂冠

面向具身操作的高效视觉–语言–动作模型：系统综述

相关资讯

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

Github项目推荐 | pikepdf - Python的PDF读写库

Github项目推荐 | pikepdf - Python的PDF读写库

AI研习社

9+阅读 · 2019年3月29日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【专知荟萃15】图像检索Image Retrieval知识资料全集（入门/进阶/综述/视频/代码/专家，附PDF下载）

【专知荟萃15】图像检索Image Retrieval知识资料全集（入门/进阶/综述/视频/代码/专家，附PDF下载）

专知

65+阅读 · 2017年11月14日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

相关论文

Deep Two-Stage High-Resolution Image Inpainting

Arxiv

0+阅读 · 2021年4月27日

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Arxiv

0+阅读 · 2021年4月27日

Quantifying Privacy Leakage in Graph Embedding

Arxiv

0+阅读 · 2021年4月26日

ReGiS: Regular Expression Simplification via Rewrite-Guided Synthesis

Arxiv

0+阅读 · 2021年4月24日

Scalable Microservice Forensics and Stability Assessment Using Variational Autoencoders

Arxiv

0+阅读 · 2021年4月23日

Deep Learning for Digital Text Analytics: Sentiment Analysis

Arxiv

4+阅读 · 2018年4月10日

Camera Style Adaptation for Person Re-identification

Arxiv

3+阅读 · 2018年4月10日

Agile Amulet: Real-Time Salient Object Detection with Contextual Attention

Arxiv

5+阅读 · 2018年2月20日

Saliency-Enhanced Robust Visual Tracking

Arxiv

6+阅读 · 2018年2月8日

LA-LDA: A Limited Attention Topic Model for Social Recommendation

Arxiv

3+阅读 · 2013年1月26日

微信扫码咨询专知VIP会员