Parallel server systems in transportation, manufacturing, and computing heavily rely on dynamic routing using connected cyber components for computation and communication. Yet, these components remain vulnerable to random malfunctions and malicious attacks, motivating the need for fault-tolerant dynamic routing that are both traffic-stabilizing and cost-efficient. In this paper, we consider a parallel server system with dynamic routing subject to reliability and stability failures. For the reliability setting, we consider an infinite-horizon Markov decision process where the system operator strategically activates protection mechanism upon each job arrival based on traffic state observations. We prove an optimal deterministic threshold protecting policy exists based on dynamic programming recursion of the HJB equation. For the security setting, we extend the model to an infinite-horizon stochastic game where the attacker strategically manipulates routing assignment. We show that both players follow a threshold strategy at every Markov perfect equilibrium. For both failure settings, we also analyze the stability of the traffic queues under control. Finally, we develop approximate dynamic programming algorithms to compute the optimal/equilibrium policies, supplemented with numerical examples and experiments for validation and illustration.
翻译:暂无翻译