In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf, or markup languages such as XML or JSON. The major limitation of these formats is readability. Human developers cannot read binary code, and in most cases, suffer from the syntax of XML or JSON. This is a major issue when objects are meant to be embedded and read in source code, such as in test cases. To address this problem, we propose plain-code serialization. Our core idea is to serialize objects observed at runtime in the native syntax of a programming language. We realize this vision in the context of Java, and demonstrate a prototype which serializes Java objects to Java source code. The resulting source faithfully reconstructs the objects seen at runtime. Our prototype is called ProDJ and is publicly available. We experiment with ProDJ to successfully plain-code serialize 174,699 objects observed during the execution of 4 open-source Java applications. Our performance measurement shows that the performance impact is not noticeable. Through a user study, we demonstrate that developers prefer plain-code serialized objects within automatically generated tests over their representations as XML or JSON.
翻译:在托管语言中,对象序列化通常通过定制二进制格式(如Protobuf)或标记语言(如XML或JSON)实现。这些格式的主要局限在于可读性:开发人员无法直接阅读二进制代码,且多数情况下难以适应XML或JSON的语法结构。当对象需要嵌入源代码中供读取时(例如测试用例场景),这一问题尤为突出。为解决此问题,我们提出纯代码序列化方法。其核心思想是使用编程语言原生语法对运行时观测到的对象进行序列化。我们在Java语言环境中实现了这一构想,并开发出可将Java对象序列化为Java源代码的原型系统。生成的源代码能准确重构运行时观测到的对象。该原型命名为ProDJ并已公开。我们通过实验,成功对4个开源Java应用执行过程中观测到的174,699个对象实现了纯代码序列化。性能测试表明该方法未产生显著性能影响。用户研究证实,相较于XML或JSON格式,开发人员更倾向于在自动生成的测试中使用纯代码序列化对象。