Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is currently not standardised; its expressivity is limited; and there is no interoperable way to store and query it. Consequently, model search, reuse, comparison, and composition are hindered. In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
翻译:机器学习(ML)实践者和组织正在建立示范动物园,由经过训练的模型组成,其中载有元数据,说明ML模型和数据集的特性,这些元数据可用于报告、审计、可复制和可解释的目的。Matatada目前没有标准化;其表达性有限;没有可互操作的存储和查询方法。因此,模型搜索、再利用、比较和组成受到阻碍。在本文件中,我们倡导标准化ML模型元数据代表性和管理,提出一个工具包,帮助从业者管理和查询该元数据。