File reading is the basis for data sharing and scientific computing. However, manual programming for file reading is labour-intensive and time-consuming, as data formats are heterogeneous and complex. To address such an issue, this study proposes a novel approach for the automatic generation of file reading programs based on structured and self-described data format information. This approach provides two modes composed of sequentially and randomly reading. The file data format is described by Data Format Markup Language and thus DFML documents are generated. The formation of data type sequences by parsing those DFML documents. The generation of programs for sequential or random reading data with formed data type sequences and general programing rules for specific programming languages. A tool named DFML Editor was developed for generating and editing DFML documents. Case studies on binary files, i.e., ESRI point shapefiles and plain text files, i.e., input files of Storm Water Management Model, were conducted with the software developed for automatic program generation and file reading. Experimental results show that the proposed approach is effective for automatically generating programs for reading files. The idea in this study is also helpful for automatically writing files.
翻译:文件读取是数据共享和科学计算的基础。 但是,文件阅读的手工程序编制是劳动密集型和耗时的,因为数据格式是多种多样的和复杂的。为解决这一问题,本研究提出了基于结构化和自定义的数据格式信息的自动生成文件读取程序的新办法。这种方法提供了由顺序和随机读取组成的两种模式。文件数据格式由数据格式标记语言描述,从而生成了DFML文件。通过对 DFML文件进行分解来形成数据类型序列。生成连续或随机读取数据的程序,并形成数据类型序列和特定程序语言的一般程序规则。开发了一个名为 DFML编辑的工具,用于生成和编辑 DFML文件。关于二进制文件的案例研究,即ESRI 点形状文件和简洁文本文件,即暴风水管理模型的输入文件,是用为自动程序生成和文件阅读开发的软件进行的。实验结果显示,拟议的方法对于自动生成文件读取程序是有效的。本研究中的想法也有助于自动写入文件。