Trusted execution environments (TEE) such as Intel's Software Guard Extension (SGX) have been widely studied to boost security and privacy protection for the computation of sensitive data such as human genomics. However, a performance hurdle is often generated by SGX, especially from the small enclave memory. In this paper, we propose a new Hybrid Secured Flow framework (called "HySec-Flow") for large-scale genomic data analysis using SGX platforms. Here, the data-intensive computing tasks can be partitioned into independent subtasks to be deployed into distinct secured and non-secured containers, therefore allowing for parallel execution while alleviating the limited size of Page Cache (EPC) memory in each enclave. We illustrate our contributions using a workflow supporting indexing, alignment, dispatching, and merging the execution of SGX- enabled containers. We provide details regarding the architecture of the trusted and untrusted components and the underlying Scorn and Graphene support as generic shielding execution frameworks to port legacy code. We thoroughly evaluate the performance of our privacy-preserving reads mapping algorithm using real human genome sequencing data. The results demonstrate that the performance is enhanced by partitioning the time-consuming genomic computation into subtasks compared to the conventional execution of the data-intensive reads mapping algorithm in an enclave. The proposed HySec-Flow framework is made available as an open-source and adapted to the data-parallel computation of other large-scale genomic tasks requiring security and scalable computational resources.
翻译:在本文中,我们提出了一个新的混合安全流动框架(称为“HySec-Flow”),用于使用SGX平台进行大规模基因组数据分析。在这里,数据密集型计算任务可分为独立的子任务,用于不同、安全和无保障的容器,因此可以平行执行,同时减少每个飞地的Cache(EPC)记忆的有限尺寸。我们用工作流程说明我们的贡献,支持编制索引、调整、发送和合并使用SGX的集装箱的执行。我们提供详细信息,说明可信和不受信任的组成部分的结构,以及作为港口遗留代码通用保护执行框架的Scorne and Gragene支持。我们彻底评估了我们的隐私保存功能,利用真实的人类基因组排序数据进行绘图。我们通过在常规成本分析中改进了数据库的进度,将数据序列分析的进度调整为可更新的轨道。