The multi-label classification (MLC) task has increasingly been receiving interest from the machine learning (ML) community, as evidenced by the growing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to the recently emerged data management standards, such as the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. To FAIRify the MLC datasets, we introduce an ontology-based online catalogue of MLC datasets that follow these principles. The catalogue extensively describes many MLC datasets with comprehensible meta-features, MLC-specific semantic descriptions, and different data provenance information. The MLC data catalogue is extensively described in our recent publication in Nature Scientific Reports, Kostovska & Bogatinovski et al., and available at: http://semantichub.ijs.si/MLCdatasets. In addition, we provide an ontology-based system for easy access and querying of performance/benchmark data obtained from a comprehensive MLC benchmark study. The system is available at: http://semantichub.ijs.si/MLCbenchmark.
翻译:多标签分类(MLC)任务日益受到机器学习(ML)界的兴趣,这体现在文献中出现的文件和方法越来越多,因此,确保正确、正确、有力和可靠的基准对外地的进一步发展至关重要,我们认为,要做到这一点,就必须遵守最近产生的数据管理标准,如FAIR(可实现、可获取、可互操作和可再使用)和TRUT(透明、责任、用户重点、可持续性和技术)原则。为了实现刚果解放运动数据集的功能化,我们采用了一个基于网上的刚果解放运动数据集网上目录,遵循这些原则。目录广泛描述了刚果解放运动许多数据集,具有可理解的元数、刚果解放运动特有的语义描述和不同的数据证明信息。刚果解放运动数据目录在我们最近出版的《自然科学报告》、Kostovska & Bogatinovski等人(Kostovska & Bogatinovski等人(http://semanticchub.is.si/MLC Datasets)。此外,我们从刚果解放运动/MLMLMLC数据库获得的易读数据库系统。