We describe a workflow used to analyze the source code of the {\sc Android OS kernel} and rate for a particular kind of bugginess that exposes a program to hacking. The workflow represents a novel approach for components' vulnerability rating. The approach is inspired by recent work on embedding source code functions. The workflow combines deep learning with heuristics and machine learning. Deep learning is used to embed function/method labels into a Euclidean space. Because the corpus of Android kernel source code is rather limited (containing approximately 2 million C/C++ functions \& Java methods), a straightforward embedding is untenable. To overcome the challenge of the dearth of data, it's necessary to go through an intermediate step of the \textit{Byte-Pair Encoding}. Subsequently, we embed the tokens from which we assemble an embedding of function/method labels. Long short-term memory networks (LSTM) are used to embed tokens into vectors in $\mathbb{R}^d$ from which we form a \textit{cosine matrix} consisting of the cosine between every pair of vectors. The cosine matrix may be interpreted as a (combinatorial) `weighted' graph whose vertices represent functions/methods and `weighted' edges correspond to matrix entries. Features that include function vectors plus those defined heuristically are used to score for risk of bugginess.
翻译:我们描述一个用于分析 ~sc Android OS 内核的源代码的工作流程, 以及显示程序黑入的某类错误的速率。 工作流程代表了对元件脆弱性评级的一种新颖的方法。 由最近关于嵌入源代码功能的工作启发了该方法。 工作流程将深度学习与杂交学和机器学习相结合。 深层学习用于将函数/ 方法标签嵌入 Euclidean 空间。 由于安尔德内核源源代码的体积相当有限( 包含大约 200万 C/ C++ 函数 ⁇ Java 方法), 直线嵌入是站不住的。 要克服数据缺乏的挑战, 就必须跨过 \ textitit{ Byte- Pair 编码} 的中间一步。 之后, 我们嵌入了用于将函数/ 方法标签嵌入嵌入到 Euclideidea 空间。 长期内存网络( LSTM) 用于将符号嵌入矢量嵌入 $mathb{R_d$$d$d$d$s clishal 嵌嵌 嵌 嵌 嵌 。 我们组成一个直exin 的矩号 的内存函数 。