Nowadays, high-volume and privacy-sensitive data are generated by mobile devices, which are better to be preserved on devices and queried on demand. However, data analysts still lack a uniform way to harness such distributed on-device data. In this paper, we propose a data querying system, Deck, that enables flexible device-centric federated analytics. The key idea of Deck is to bypass the app developers but allow the data analysts to directly submit their analytics code to run on devices, through a centralized query coordinator service. Deck provides a list of standard APIs to data analysts and handles most of the device-specific tasks underneath. Deck further incorporates two key techniques: (i) a hybrid permission checking mechanism and mandatory cross-device aggregation to ensure data privacy; (ii) a zero-knowledge statistical model that judiciously trades off query delay and query resource expenditure on devices. We fully implement Deck and plug it into 20 popular Android apps. An in-the-wild deployment on 1,642 volunteers shows that Deck significantly reduces the query delay by up to 30x compared to baselines. Our microbenchmarks also demonstrate that the standalone overhead of Deck is negligible.
翻译:目前,高容量和对隐私敏感的数据是由移动设备生成的,这些设备最好保存在设备上,并根据需求进行查询。然而,数据分析师仍然缺乏一种统一的方法来利用这种在设备上分发的数据。在本文件中,我们提议了一个数据查询系统Deck,这个系统可以灵活地使用以装置为中心的联合分析器。Deck的关键想法是绕过应用程序开发者,但允许数据分析员通过集中查询协调员服务,直接将其分析代码在设备上运行。Deck向数据分析员提供标准API清单,并处理下方大多数特定设备任务。Deck还采用了两个关键技术:(一) 混合许可检查机制和强制性跨设备组合,以确保数据隐私;(二) 零知识统计模型,明智地交换查询延迟和查询设备的资源支出。我们完全采用Deck,将其插入20个流行的安非他命应用程序。在1 642名志愿人员中进行的内部部署表明,Deck将查询延迟率大大降低到30x,比基线。我们的微位标记也表明,顶部是可忽略的顶部。