Latent Semantic Analysis is a method of matrix decomposition used for discovering topics and topic weights in natural language documents. This study uses Latent Semantic Analysis to analyze the composition of binaries of malicious programs. The semantic representation of the term frequency vector representation yields a set of topics, each topic being a composition of terms. The vectors and topics were evaluated quantitatively using a spatial representation. This semantic analysis provides a more abstract representation of the program derived from its term frequency analysis. We use a metric space to represent a program as a collection of vectors, and a distance metric to evaluate their similarity within a topic. The segmentation of the vectors in this dataset provides increased resolution into the program structure.
翻译:远程语义分析是一种矩阵分解方法,用于在自然语言文件中发现专题和专题权重。本研究使用边端语义分析分析恶意程序二进制的构成。频率矢量代表术语的语义表达方式产生一系列专题,每个专题由术语组成。矢量和专题用空间代表方式进行了定量评价。这种语义分析提供了从用词频率分析得出的程序更抽象的表达方式。我们使用一个测量空间来代表一个程序,作为矢量的集合,并使用一个距离测量尺度来评价其在一个主题中的相似性。该数据集中矢量的分解为方案结构提供了更大的分辨率。