Modern distributed systems are supported by fault-tolerant algorithms, like Reliable Broadcast and Consensus, that assure the correct operation of the system even when some of the nodes of the system fail. However, the development of distributed algorithms is a manual and complex process, resulting in scientific papers that usually present a single algorithm or variations of existing ones. To automate the process of developing such algorithms, this work presents an intelligent agent that uses Reinforcement Learning to generate correct and efficient fault-tolerant distributed algorithms. We show that our approach is able to generate correct fault-tolerant Reliable Broadcast algorithms with the same performance of others available in the literature, in only 12,000 learning episodes.
翻译:现代分布式系统得到诸如可靠广播和共识等容错算法的支持,这种算法确保系统即使在系统的某些节点失效时也能正确运行。然而,分错算法的开发是一个人工和复杂的过程,其结果是科学论文通常提供单一的算法或现有算法的变异。为使这种算法的开发过程自动化,这项工作提供了一个智能代理法,利用强化学习生成正确和高效的容错分布式算法。我们表明,我们的方法能够产生正确的、容错可靠广播算法,而文献中只有12 000个学习阶段的其他人的同样性能。