Correctly identifying the type of file under examination is a critical part of a forensic investigation. The file type alone suggests the embedded content, such as a picture, video, manuscript, spreadsheet, etc. In cases where a system owner might desire to keep their files inaccessible or file type concealed, we propose using an adversarially-trained machine learning neural network to determine a file's true type even if the extension or file header is obfuscated to complicate its discovery. Our semi-supervised generative adversarial network (SGAN) achieved 97.6% accuracy in classifying files across 11 different types. We also compared our network against a traditional standalone neural network and three other machine learning algorithms. The adversarially-trained network proved to be the most precise file classifier especially in scenarios with few supervised samples available. Our implementation of a file classifier using an SGAN is implemented on GitHub (https://ksaintg.github.io/SGAN-File-Classier).
翻译:正确识别正在检查的文件类型是法医调查的关键部分。 仅文件类型就表明嵌入的内容, 如图片、视频、手稿、电子表格等。 在系统所有人可能希望使其文件无法进入或隐藏文件类型的情况下,我们提议使用一个经过对抗性训练的机器学习神经网络来确定文件的真实类型, 即使扩展或文件头模糊不清, 使其发现复杂化。 我们的半监督的基因对抗网络(SGAN)在对11种不同类型文件进行分类时实现了97.6%的准确性。 我们还将我们的网络与传统的独立神经网络和另外三种机器学习算法进行了比较。 事实证明, 对抗性培训的网络是最精确的文件分类器, 特别是在很少有受监督的样本的情况下。 我们在GitHub(https://ksaintg.github.io/SGAN-File-Classier)上实施了使用SGAN的分类器( https://ksaintg.github. io/SGAN- File-Classier) 。