一个字节序列胜过一张图像：使用位移和n-Gram嵌入的卷积神经网络进行文件片段分类 (A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings) - 专知论文

会员服务 ·

0

片段 · N元 · 序列 · 卷积神经网络 · 捕获 ·

2023 年 4 月 14 日

A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

翻译：一个字节序列胜过一张图像：使用位移和n-Gram嵌入的卷积神经网络进行文件片段分类

Wenyang Liu,Yi Wang,Kejun Wu,Kim-Hui Yap,Lap-Pui Chau

from arxiv, Accepted by AICAS 2023

File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence \& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image.

翻译：文件片段分类(FFC)在内存取证和互联网安全中是必须的。现有方法主要将文件片段视为1维字节信号，并利用捕获的字节间特征进行分类，而字节内的位信息即字节内信息很少被考虑。这对于对可变长度编码文件进行分类是不合适的，因为文件中的符号被表示为可变数量的位。相反，我们提出了Byte2Image，这是一种新的数据增强技术，将被忽视的字节内信息引入到文件片段中，并将其重新视为2D灰度图像，从而通过强大的卷积神经网络（CNN）同时捕获字节间和字节内的相关性。具体而言，为了将文件片段转换为2D图像，我们采用滑动字节窗口来暴露被忽视的字节内信息，并逐行堆叠其n-gram特征。我们进一步提出了一个字节序列和图像融合网络作为分类器，该网络可以联合建模原始的1维字节序列和转换后的2D图像来执行FFC。在FFT-75数据集上的实验证明，与最先进的方法相比，我们提出的方法在几乎所有情况下都可以实现显著的准确度提高。代码将发布在 https://github.com/wenyang001/Byte2Image。

0

相关内容

Graph Transformer近期进展

Graph Transformer近期进展

专知会员服务

63+阅读 · 2023年1月5日

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

专知会员服务

42+阅读 · 2022年3月12日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ICML2021】使用Transformers编码的计算感知神经架构

专知会员服务

18+阅读 · 2021年9月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

已删除

将门创投

14+阅读 · 2019年5月29日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

每日三篇 | KMNIST数据集；DropMax正则化；Inst2Vec程序指令嵌入

每日三篇 | KMNIST数据集；DropMax正则化；Inst2Vec程序指令嵌入

论智

18+阅读 · 2018年12月16日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

word2vec中文语料训练

word2vec中文语料训练

全球人工智能

12+阅读 · 2018年4月23日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

开源｜基于tensorflow使用CNN-RNN进行中文文本分类！

开源｜基于tensorflow使用CNN-RNN进行中文文本分类！

全球人工智能

11+阅读 · 2017年11月12日

基于ancilla量子位的多通道量子视频生成及加密方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于可逆整型变换与特征分析的图像压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

维持压缩率的JPEG图像选择性加密方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

跨格式抗干扰的图像水印算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

蛋白质-多酚-多糖自组装与共价复合物生成机理及其功能表征

国家自然科学基金

0+阅读 · 2012年12月31日

基于类别结构信息和结构化学习的维数约简

国家自然科学基金

0+阅读 · 2011年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

压缩域图像大容量无损信息隐藏技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于冗余字典和感知压缩的空间音频对象编码

国家自然科学基金

0+阅读 · 2011年12月31日

图像的相空间表示及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

A Multi-Modal Transformer Network for Action Detection

Arxiv

0+阅读 · 2023年5月31日

LENS: A Learnable Evaluation Metric for Text Simplification

Arxiv

0+阅读 · 2023年5月30日

PointNu-Net: Keypoint-assisted Convolutional Neural Network for Simultaneous Multi-tissue Histology Nuclei Segmentation and Classification

Arxiv

0+阅读 · 2023年5月30日

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Arxiv

0+阅读 · 2023年5月29日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Arxiv

11+阅读 · 2021年1月7日

Text Classification Algorithms: A Survey

Arxiv

16+阅读 · 2020年5月20日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Graph Convolutional Networks for Text Classification

Arxiv

31+阅读 · 2018年11月13日

VIP会员

文章信息

相关主题

卷积神经网络

相关VIP内容

Graph Transformer近期进展

Graph Transformer近期进展

专知会员服务

63+阅读 · 2023年1月5日

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

专知会员服务

42+阅读 · 2022年3月12日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ICML2021】使用Transformers编码的计算感知神经架构

专知会员服务

18+阅读 · 2021年9月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

已删除

将门创投

14+阅读 · 2019年5月29日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

每日三篇 | KMNIST数据集；DropMax正则化；Inst2Vec程序指令嵌入

每日三篇 | KMNIST数据集；DropMax正则化；Inst2Vec程序指令嵌入

论智

18+阅读 · 2018年12月16日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

word2vec中文语料训练

word2vec中文语料训练

全球人工智能

12+阅读 · 2018年4月23日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

开源｜基于tensorflow使用CNN-RNN进行中文文本分类！

开源｜基于tensorflow使用CNN-RNN进行中文文本分类！

全球人工智能

11+阅读 · 2017年11月12日

相关论文

A Multi-Modal Transformer Network for Action Detection

Arxiv

0+阅读 · 2023年5月31日

LENS: A Learnable Evaluation Metric for Text Simplification

Arxiv

0+阅读 · 2023年5月30日

PointNu-Net: Keypoint-assisted Convolutional Neural Network for Simultaneous Multi-tissue Histology Nuclei Segmentation and Classification

Arxiv

0+阅读 · 2023年5月30日

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Arxiv

0+阅读 · 2023年5月29日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Arxiv

11+阅读 · 2021年1月7日

Text Classification Algorithms: A Survey

Arxiv

16+阅读 · 2020年5月20日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Graph Convolutional Networks for Text Classification

Arxiv

31+阅读 · 2018年11月13日

相关基金

基于ancilla量子位的多通道量子视频生成及加密方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于可逆整型变换与特征分析的图像压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

维持压缩率的JPEG图像选择性加密方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

跨格式抗干扰的图像水印算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

蛋白质-多酚-多糖自组装与共价复合物生成机理及其功能表征

国家自然科学基金

0+阅读 · 2012年12月31日

基于类别结构信息和结构化学习的维数约简

国家自然科学基金

0+阅读 · 2011年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

压缩域图像大容量无损信息隐藏技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于冗余字典和感知压缩的空间音频对象编码

国家自然科学基金

0+阅读 · 2011年12月31日

图像的相空间表示及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员