As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication- and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses. Extensive empirical studies show that the proposed ECOS improves the quality of automated client labeling, model compression, and label outsourcing when applied in various learning scenarios.
翻译:随着对计算和数据资源需求的不断增长,随着深层次学习的蓬勃发展,向强大的云服务器外包模式培训成为在低功率和成本效益低的终端设备培训的一种有吸引力的替代方法。传统外包要求将设备数据上传到云服务器,由于所收集的数据往往敏感,通信带宽有限,许多现实世界应用程序中可能无法上传这些数据。为了应对这些挑战,我们提议利用广泛可得的开放源数据,即从公共和多种来源(例如因特网图像)收集的大量数据集。我们制定了名为“高效合作开放源抽样”的新战略,以从开放源数据中建立一个模拟代用数据集,用于云培训,取代客户数据。ECOS在云服务器上探索开放源数据,以通过通信和计算效率高的取样程序感知客户数据的分配,仅传达几个压缩的公共特征和客户的卡路里反应。广泛的实证研究表明,拟议的ECOS改进了在各种学习情景中应用的自动客户标签、模型压缩和标签外包的质量。