争取采用支持数据和分配方法共享建议系统 (Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing)

Data and algorithm sharing is an imperative part of data and AI-driven economies. The efficient sharing of data and algorithms relies on the active interplay between users, data providers, and algorithm providers. Although recommender systems are known to effectively interconnect users and items in e-commerce settings, there is a lack of research on the applicability of recommender systems for data and algorithm sharing. To fill this gap, we identify six recommendation scenarios for supporting data and algorithm sharing, where four of these scenarios substantially differ from the traditional recommendation scenarios in e-commerce applications. We evaluate these recommendation scenarios using a novel dataset based on interaction data of the OpenML data and algorithm sharing platform, which we also provide for the scientific community. Specifically, we investigate three types of recommendation approaches, namely popularity-, collaboration-, and content-based recommendations. We find that collaboration-based recommendations provide the most accurate recommendations in all scenarios. Plus, the recommendation accuracy strongly depends on the specific scenario, e.g., algorithm recommendations for users are a more difficult problem than algorithm recommendations for datasets. Finally, the content-based approach generates the least popularity-biased recommendations that cover the most datasets and algorithms.

翻译：高效分享数据和算法取决于用户、数据提供者和算法提供者之间的积极互动。尽管已知推荐人系统能够有效地连接电子商务环境中的用户和项目,但缺乏关于建议人系统对数据和算法分享适用性的研究。为了填补这一空白,我们确定了支持数据和算法分享的六种建议设想方案,其中四种设想方案与电子商务应用中的传统建议设想方案有很大不同。我们利用基于开放ML数据和算法分享平台互动数据的新数据集来评估这些建议设想方案,我们也向科学界提供这些数据。具体地说,我们调查了三种建议方法,即普及、协作和基于内容的建议。我们发现,基于合作的建议在所有设想方案中都提供了最准确的建议。此外,建议准确性在很大程度上取决于具体设想方案,例如,用户的算法建议比数据集的算法建议更困难。最后,基于内容的方法产生了覆盖大多数数据集和算法的最不普及的建议。