The delivery of key services in domains ranging from finance and manufacturing to healthcare and transportation is underpinned by a rapidly growing number of mission-critical enterprise applications. Ensuring the continuity of these complex applications requires the use of software-managed infrastructures called high-availability clusters (HACs). HACs employ sophisticated techniques to monitor the health of key enterprise application layers and of the resources they use, and to seamlessly restart or relocate application components after failures. In this paper, we first describe the manifold uses of HACs to protect essential layers of a critical application and present the architecture of high availability clusters. We then propose a taxonomy that covers all key aspects of HACs -- deployment patterns, application areas, types of cluster, topology, cluster management, failure detection and recovery, consistency and integrity, and data synchronisation; and we use this taxonomy to provide a comprehensive survey of the end-to-end software solutions available for the HAC deployment of enterprise applications. Finally, we discuss the limitations and challenges of existing HAC solutions, and we identify opportunities for future research in the area.
翻译:确保这些复杂应用的连续性需要使用软件管理的基础设施,称为高可用性组群(HACs)。 高级ACs采用尖端技术,监测关键企业应用层的健康状况及其使用的资源,并在失败后无缝地重新启用或迁移应用组件。在本文件中,我们首先介绍高ACs在保护关键应用基本层面方面的多种用途,并介绍高可用性组群的结构。然后我们提出一个分类学,涵盖高可用性组群的所有关键方面 -- -- 部署模式、应用领域、组群类型、地形学、集束管理、故障检测和恢复、一致性和完整性以及数据同步;我们利用这一分类学,对高关键应用组群集部署现有端对端软件解决方案的全面调查。最后,我们讨论了现有高可用性组群集的局限性和挑战,我们确定了该地区今后研究的机会。