This report is part of the DataflowOpt project on optimization of modern dataflows and aims to introduce a data quality-aware cost model that covers the following aspects in combination: (1) heterogeneity in compute nodes, (2) geo-distribution, (3) massive parallelism, (4) complex DAGs and (5) streaming applications. Such a cost model can be then leveraged to devise cost-based optimization solutions that deal with task placement and operator configuration.
翻译:本报告是关于优化现代数据流的数据流项目的一部分,目的是采用一个数据质量成本模型,综合涵盖以下几个方面:(1) 计算节点的异质性,(2) 地理分布,(3) 大规模平行,(4) 复杂的数据流和(5) 流应用,然后可以利用这种成本模型来设计基于成本的优化解决方案,处理任务安排和操作员配置。