The quality of learning generally improves with the scale and diversity of data. Companies and institutions can therefore benefit from building models over shared data. Many cloud and blockchain platforms, as well as government initiatives, are interested in providing this type of service. These cooperative efforts face a challenge, which we call ``exclusivity attacks''. A firm can share distorted data, so that it learns the best model fit, but is also able to mislead others. We study protocols for long-term interactions and their vulnerability to these attacks, in particular for regression and clustering tasks. We conclude that the choice of protocol, as well as the number of Sybil identities an attacker may control, is material to vulnerability.
翻译:学习质量一般随着数据的规模和多样性而提高。因此,公司和机构可以从建立模型而获益于共享数据。许多云层和块链平台以及政府倡议都有兴趣提供这类服务。这些合作努力面临挑战,我们称之为“排他性攻击 ” 。一个公司可以共享扭曲的数据,从而学习最佳的模型,但也能够误导他人。我们研究长期互动协议及其易受这些攻击的脆弱性,特别是回归和集群任务。我们的结论是,协议的选择以及袭击者可能控制的Sybil身份的数量对于脆弱性至关重要。