There are several approaches to encode source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization task suggests that simple snapshots of input programs have comparable performance to the state-of-the-art representations. Interestingly, obscuring the input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs.
翻译:在神经模型输入矢量的输入矢量中,有几种编码源代码的方法。 这些方法试图将输入程序的各种合成和语义特征纳入其编码中。 在本文中,我们调查了代码2Snapshot, 这是基于输入程序快照的源代码的新表述。 我们评估了这种表达方式的几种不同,并将其性能与使用输入程序丰富的合成和语义特征的最先进的表达方式进行比较。 我们对代码总和任务中代码2Snapshot的实用性的初步研究显示,输入程序的简单快照与最先进的表达方式相似。 有趣的是,模糊输入程序对代码2Snapshot的性能影响不大, 表明对于某些任务来说,神经模型仅依靠输入程序的结构就可以提供高性能。