Fault-tolerance techniques depend on replication to enhance availability, albeit at the cost of increased infrastructure costs. This results in a fundamental trade-off: Fault-tolerant services must satisfy given availability and performance constraints while minimising the number of replicated resources. These constraints pose capacity planning challenges for the service operators to minimise replication costs without negatively impacting availability. To this end, we present PCRAFT, a system to enable capacity planning of dependable services. PCRAFT's capacity planning is based on a hybrid approach that combines empirical performance measurements with probabilistic modelling of availability based on fault injection. In particular, we integrate traditional service-level availability mechanisms (active route anywhere and passive failover) and deployment schemes (cloud and on-premises) to quantify the number of nodes needed to satisfy the given availability and performance constraints. Our evaluation based on real-world applications shows that cloud deployment requires fewer nodes than on-premises deployments. Additionally, when considering on-premises deployments, we show how passive failover requires fewer nodes than active route anywhere. Furthermore, our evaluation quantify the quality enhancement given by additional integrity mechanisms and how this affects the number of nodes needed.
翻译:防过失技术取决于推广,以提高供应量,尽管这样做的成本增加了基础设施成本。这导致一个根本性的权衡:防过失服务必须满足供应量和性能限制,同时尽量减少复制资源的数量。这些制约给服务运营者提出了能力规划挑战,以尽量减少复制成本,而不会对供应量产生消极影响。为此,我们提出PCRAFT,这是一个能够对可靠服务进行能力规划的系统。PCRAFT的能力规划基于一种混合方法,将经验性业绩衡量与根据注射过失对供应量的概率建模相结合。特别是,我们整合了传统的服务级可用性机制(任何地方的主动路线和被动故障)和部署计划(布置和假设),以量化满足供应量和性限制所需的节点数量。我们根据现实应用进行的评估表明,云的部署需要比部署的频率要少。此外,在考虑部署假设时,我们表明被动性失灵需要比任何地方的积极路线少。此外,我们的评价还量化了额外完整性机制所提供的质量提高以及这如何影响无需要的数量。