There are several approaches for encoding source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization and code classification tasks suggests that simple snapshots of input programs have comparable performance to state-of-the-art representations. Interestingly, obscuring input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs.
翻译:神经模型输入矢量的编码源代码有几种方法。 这些方法试图将输入程序的各种合成和语义特征纳入编码中。 在本文中, 我们调查了代码2Snapshot, 这是源代码的新表述, 以输入程序的快照为基础。 我们评估了这种表达方式的几种不同, 并将其性能与使用输入程序丰富的合成和语义特征的最新表现进行比较。 我们对代码总和和和代码分类任务中代码2Snapshot的实用性的初步研究表明, 输入程序的简单快照可以与最新表现相似。 有趣的是, 模糊输入程序对代码2Snapshot的性能影响不大, 表明对于某些任务来说, 神经模型仅依靠输入程序的结构就可以提供高性能。