Differentially private (DP) databases can enable privacy-preserving analytics over datasets or data streams containing sensitive personal records. In such systems, user privacy is a very limited resource that is consumed by every new query, and hence must be aggressively conserved. We propose Boost, the most effective caching component for linear query workloads over DP databases. Boost builds upon private multiplicative weights (PMW), a DP mechanism that is powerful in theory but very ineffective in practice, and transforms it into a highly effective caching object, PMW-Bypass, which uses prior-query results obtained through an external DP mechanism to train a PMW to answer arbitrary future linear queries accurately and "for free" from a privacy perspective. We show that Boost with PMW-Bypass conserves significantly more budget compared to vanilla PMW and simpler cache designs: at least 1.51 - 14.25x improvement in experiments on public Covid19 and CitiBike datasets. Moreover, Boost incorporates support for range-query workloads, such as timeseries or streaming workloads, where opportunities exist to further conserve privacy budget through DP parallel composition and warm-starting of PMW state. Our work thus establishes both a coherent system design and the theoretical underpinnings for effective caching in DP databases.
翻译:暂无翻译