During the development of NoSQL-backed software, the data model evolves naturally alongside the application code. Especially in agile development, new application releases are deployed frequently causing schema changes. Eventually, decisions have to be made regarding the migration of versioned legacy data which is persisted in the cloud-hosted production database. We solve this schema evolution problem and present the results of near-exhaustive calculations by means of which software project stakeholders can manage the operative costs for data model evolution and adapt their software release strategy accordingly in order to comply with service-level agreements regarding the competing metrics of migration costs and latency. We clarify conclusively how data model evolution in NoSQL databases impacts the metrics while taking all relevant characteristics of migration scenarios into account. As calculating all possible combinatorics in the search space of migration scenarios would by far exceed computational means, we used a probabilistic Monte Carlo method of repeated sampling, serving as a well-established means to bring the complexity of data model evolution under control. Our experiments show the qualitative and quantitative impact on the performance of migration strategies with respect to intensity and distribution of data entity accesses, the kinds of schema changes, and the characteristics of the underlying data model.
翻译:在开发NOSQL支持的软件期间,数据模型自然地随着应用代码而演化。特别是在敏捷的开发中,新的应用单元的部署经常导致系统变化。最终,必须就云端生产数据库中持续存在的版本遗留数据的迁移作出决定。我们解决了这一系统演变问题,并提出了几乎详尽的计算结果,软件项目利益攸关方可借此管理数据模型演变的操作成本,并相应调整其软件发布战略,以遵守关于相互竞争的移徙费用和延缓度指标的服务级协议。我们明确了NOSQL数据库数据模型的演变如何影响指标,同时考虑到移徙情景的所有相关特征。在移徙情景搜索空间中计算所有可能的组合数据将远远超过计算方法,我们使用了一种反复取样的概率性蒙特卡洛方法,作为控制数据模型演变复杂性的既定手段。我们的实验显示,在数据实体访问强度和分布方面,在质量和数量上对移徙战略的绩效产生了影响,同时考虑到数据访问的密集度和分布、模型变化的种类和特征。