Nowadays, software is one of the cornerstones when conducting research in several scientific fields which employ computer-based methodologies to answer new research questions. However, for these experiments to be completely reproducible, research software should comply with the FAIR principles, yet its metadata can be represented following different data models and spread across different locations. In order to bring some cohesion to the field, CodeMeta was proposed as a vocabulary to represent research software metadata in a unified and standardised manner. While existing tools can help users to generate CodeMeta files for some specific use cases, they fall short on flexibility and adaptability. Hence, in this work, I propose the use of declarative mapping rules to generate CodeMeta files, illustrated through the implementation of three crosswalks in ShExML which are then expanded and merged to cover the generation of CodeMeta files for two existing research software artefacts. Moreover, the outputs are validated using SHACL and ShEx and the whole generation workflow is automated requiring minimal user intervention upon a new version release. This work can, therefore, be used as an example upon which other developers can include a CodeMeta generation workflow in their repositories, facilitating the adoption of CodeMeta and, ultimately, increasing research software FAIRness.
翻译:如今,软件已成为多个科学领域开展研究的重要基石,这些领域采用基于计算机的方法来解答新的研究问题。然而,要使这些实验具备完全的可复现性,研究软件应当遵循FAIR原则,但其元数据可能遵循不同的数据模型并分散存储于不同位置。为增强该领域的统一性,CodeMeta作为一种标准化词汇表被提出,用于以统一规范的方式表示研究软件元数据。虽然现有工具可为特定用例生成CodeMeta文件,但在灵活性与适应性方面仍存在不足。为此,本研究提出采用声明式映射规则生成CodeMeta文件,并通过在ShExML中实现三个跨映射方案进行例证,进而扩展并整合这些方案以覆盖两个现有研究软件制品的CodeMeta文件生成。此外,利用SHACL和ShEx对输出结果进行验证,且整个生成工作流程可实现自动化,在新版本发布时仅需极少人工干预。因此,本工作可作为范例供其他开发者参考,将其纳入代码仓库的CodeMeta生成流程,从而促进CodeMeta的采纳应用,最终提升研究软件的FAIR化水平。