双重SC:通过变压器和双重学习自动生成和概括壳牌码 (DualSC: Automatic Generation and Summarization of Shellcode via Transformer and Dual Learning)

A shellcode is a small piece of code and it is executed to exploit a software vulnerability, which allows the target computer to execute arbitrary commands from the attacker through a code injection attack. Similar to the purpose of automated vulnerability generation techniques, the automated generation of shellcode can generate attack instructions, which can be used to detect vulnerabilities and implement defensive measures. While the automated summarization of shellcode can help users unfamiliar with shellcode and network information security understand the intent of shellcode attacks. In this study, we propose a novel approach DualSC to solve the automatic shellcode generation and summarization tasks. Specifically, we formalize automatic shellcode generation and summarization as dual tasks, use a shallow Transformer for model construction, and design a normalization method Adjust QKNorm to adapt these low-resource tasks (i.e., insufficient training data). Finally, to alleviate the out-of-vocabulary problem, we propose a rulebased repair component to improve the performance of automatic shellcode generation. In our empirical study, we select a highquality corpus Shellcode IA32 as our empirical subject. This corpus was gathered from two real-world projects based on the line-by-line granularity. We first compare DualSC with six state-of-the-art baselines from the code generation and code summarization domains in terms of four performance measures. The comparison results show the competitiveness of DualSC. Then, we verify the effectiveness of the component setting in DualSC. Finally, we conduct a human study to further verify the effectiveness of DualSC.

翻译：贝壳编码是一个小代码,用于利用软件的脆弱性,使目标计算机能够通过密码注射攻击执行攻击者的任意指令。类似自动脆弱性生成技术的目的,自动生成贝壳编码可以产生攻击指令,可用于检测弱点和实施防御措施。虽然自动合成贝壳编码可以帮助不熟悉贝壳编码和网络信息安全的用户理解贝壳编码攻击的意图。在本研究中,我们提议了一种新颖的方法,即“双重”解决自动贝壳编码生成和合成任务。具体地说,我们正式将自动贝壳编码生成和合成作为双重任务,使用浅浅色变换器进行模型建设,并设计一种正常化方法,调整QKNorm以调整这些低资源任务(即培训数据不足)。最后,为了减轻弹壳编码和网络信息安全问题,我们提议了一个基于规则的修理部分,以改进自动贝壳编码生成的性能。我们在实验研究中,选择了一个高品质的 Shellcco Card IA32作为我们的实验主题。本本本项来自两个真实世界的项目,用浅色变变变的“标准”的Silal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-IDal-IDal-IDal-IDal-C-I-I-IDal-IDal-Cal-Cal-Cal-Cal-Cal-Cal-Cal-C-Cal-I.