关于决策树集合的高效加密引用 (Efficient Encrypted Inference on Ensembles of Decision Trees)

Data privacy concerns often prevent the use of cloud-based machine learning services for sensitive personal data. While homomorphic encryption (HE) offers a potential solution by enabling computations on encrypted data, the challenge is to obtain accurate machine learning models that work within the multiplicative depth constraints of a leveled HE scheme. Existing approaches for encrypted inference either make ad-hoc simplifications to a pre-trained model (e.g., replace hard comparisons in a decision tree with soft comparators) at the cost of accuracy or directly train a new depth-constrained model using the original training set. In this work, we propose a framework to transfer knowledge extracted by complex decision tree ensembles to shallow neural networks (referred to as DTNets) that are highly conducive to encrypted inference. Our approach minimizes the accuracy loss by searching for the best DTNet architecture that operates within the given depth constraints and training this DTNet using only synthetic data sampled from the training data distribution. Extensive experiments on real-world datasets demonstrate that these characteristics are critical in ensuring that DTNet accuracy approaches that of the original tree ensemble. Our system is highly scalable and can perform efficient inference on batched encrypted (134 bits of security) data with amortized time in milliseconds. This is approximately three orders of magnitude faster than the standard approach of applying soft comparison at the internal nodes of the ensemble trees.

翻译：虽然同质加密(HH)提供了一种潜在的解决方案,使得能够对加密数据进行计算,但挑战在于获得精确的机器学习模型,这些模型在高层次高官计划的多倍深度限制下发挥作用。现有的加密推论方法要么以精确性成本将基于云的机器学习服务用于预先培训的模式(例如,用软参照系统取代以软参照系统取代在决策树上进行硬比较),要么直接用原始培训集来培训一个新的深度限制模型。在这项工作中,我们提出了一个框架,将复杂的决策树集合所提取的知识转移到浅层神经网络(称为DTNETETS),这非常有利于加密的推断。我们的方法是通过在特定深度限制范围内搜索最佳的DTNet结构,从而最大限度地减少准确性损失。我们的方法是利用仅从培训数据分布中取样的合成数据来培训DTNet。关于现实世界数据集的广泛实验表明,这些特性对于确保DTNet的准确性方法在原始树型结构中采用比原始的软性精度方法(称为DTNETetretsemetrets)的精度是十分关键的。我们的系统在三层安全级中进行高度的比较。