安全流处理系统中的查询泄露 (Leaking Queries On Secure Stream Processing Systems)

Stream processing systems are important in modern applications in which data arrive continuously and need to be processed in real time. Because of their resource and scalability requirements, many of these systems run on the cloud, which is considered untrusted. Existing works on securing databases on the cloud focus on protecting the data, and most systems leverage trusted hardware for high performance. However, in stream processing systems, queries are as sensitive as the data because they contain the application logics. We demonstrate that it is practical to extract the queries from stream processing systems that use Intel SGX for securing the execution engine. The attack performed by a malicious cloud provider is based on timing side channels, and it works in two phases. In the offline phase, the attacker profiles the execution time of individual stream operators, based on synthetic data. This phase outputs a model that identifies individual stream operators. In the online phase, the attacker isolates the operators that make up the query, monitors its execution, and recovers the operators using the model in the previous phase. We implement the attack based on popular data stream benchmarks using SecureStream and NEXMark, and demonstrate attack success rates of up to 92%. We further discuss approaches that can harden streaming processing systems against our attacks without incurring high overhead.

翻译：流处理系统在现代应用中至关重要，这些应用中的数据持续到达且需要实时处理。由于其资源需求和可扩展性要求，许多此类系统运行在云上，而云环境被视为不可信。现有关于保护云端数据库安全的研究主要集中于保护数据，且大多数系统利用可信硬件以实现高性能。然而，在流处理系统中，查询与数据同样敏感，因为它们包含了应用逻辑。我们证明，从使用Intel SGX保护执行引擎的流处理系统中提取查询是可行的。由恶意云提供商实施的攻击基于时序侧信道，并分为两个阶段进行。在离线阶段，攻击者基于合成数据对单个流操作符的执行时间进行特征分析。此阶段输出一个能够识别单个流操作符的模型。在在线阶段，攻击者隔离构成查询的操作符，监控其执行，并利用前一阶段的模型恢复操作符。我们基于SecureStream和NEXMark等流行的数据流基准测试实现了该攻击，并展示了高达92%的攻击成功率。我们进一步讨论了在不引入高开销的情况下，能够强化流处理系统以抵御此类攻击的方法。