Due to privacy concerns of users and law enforcement in data security and privacy, it becomes more and more difficult to share data among organizations. Data federation brings new opportunities to the data-related cooperation among organizations by providing abstract data interfaces. With the development of cloud computing, organizations store data on the cloud to achieve elasticity and scalability for data processing. The existing data placement approaches generally only consider one aspect, which is either execution time or monetary cost, and do not consider data partitioning for hard constraints. In this paper, we propose an approach to enable data processing on the cloud with the data from different organizations. The approach consists of a data federation platform named FedCube and a Lyapunov-based data placement algorithm. FedCube enables data processing on the cloud. We use the data placement algorithm to create a plan in order to partition and store data on the cloud so as to achieve multiple objectives while satisfying the constraints based on a multi-objective cost model. The cost model is composed of two objectives, i.e., reducing monetary cost and execution time. We present an experimental evaluation to show our proposed algorithm significantly reduces the total cost (up to 69.8\%) compared with existing approaches.
翻译:由于用户的隐私问题和在数据安全和隐私方面的执法问题,各组织之间越来越难以分享数据。数据联合会通过提供抽象的数据界面,为各组织之间与数据有关的合作带来了新的机会。随着云计算的发展,各组织在云上储存数据,以实现数据的弹性和可缩放性。现有的数据放置方法一般只考虑一个方面,即执行时间或货币成本,而不考虑数据分割以克服困难。在本文件中,我们提出一种方法,使云层上的数据处理与不同组织的数据相匹配。这个方法包括一个称为FedCube的数据联合会平台和一个基于Lyapunov的数据放置算法。FedCube使云上的数据处理得以进行。我们使用数据放置算法来建立一个计划,以便分割和储存云上的数据,从而实现多重目标,同时满足基于多目标成本模型的制约。成本模型由两个目标组成,即降低货币成本和执行时间。我们提出试验性评估,以显示我们提议的算法与现有方法相比,大大降低了总成本(达69.8 ⁇ )。