Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.
翻译:数据集授权目前是机器学习系统开发中的一个问题。在机器学习系统的开发中,最常用的是公开可用的数据集合。然而,由于这些公开可用的数据集合中的图像主要来自于互联网,一些图像并不商业可用。此外,在使用数据集合训练机器学习模型时,开发者往往不关注数据集合的授权情况。总而言之,机器学习系统数据集授权在各个方面都处于不完整的状态。我们对两个数据集合的调查表明,当前大多数数据集合缺乏授权,而缺乏授权使得难以确定数据集合的商业可用性。因此,我们决定采取更加科学和系统的方法研究数据集合的授权以及使用数据集合的机器学习系统的授权,以便未来机器学习系统开发者更加容易和合规化地开发机器学习系统。