Rollback recovery strategies are well-known in concurrent and distributed systems. In this context, recovering from unexpected failures is even more relevant given the non-deterministic nature of execution, which means that it is practically impossible to foresee all possible process interactions. In this work, we consider a message-passing concurrent programming language where processes interact through message sending and receiving, but shared memory is not allowed. In this context, we design a checkpoint-based rollback recovery strategy that does not need a central coordination. For this purpose, we extend the language with three new operators: check, commit, and rollback. Furthermore, our approach is purely asynchronous, which is an essential ingredient to developing a source-to-source program instrumentation implementing a rollback recovery strategy.
翻译:暂无翻译