In this paper, we present distributed fault-tolerant algorithms that approximate the centroid of a set of $n$ data points in $\mathbb{R}^d$. Our work falls into the broader area of approximate multidimensional Byzantine agreement. The standard approach used in existing algorithms is to agree on a vector inside the convex hull of all correct vectors. This strategy dismisses many possibly correct data points. As a result, the algorithm does not necessarily agree on a representative value. To find better convergence strategies for the algorithms, we use the novel concept of defining an approximation of the centroid in the presence of Byzantine adversaries. We show that the standard agreement algorithms do not allow us to compute a better approximation than $2d$ of the centroid in the synchronous case. We investigate the trade-off between the quality of the approximation, the resilience of the algorithm, and the validity of the solution in order to design better approximation algorithms. For the synchronous case, we show that it is possible to achieve an optimal approximation of the centroid with up to $t<n/(d+1)$ Byzantine data points. This approach however does not give any guarantee on the validity of the solution. Therefore, we develop a second approach that reaches a $2\sqrt{d}$-approximation of the centroid, while satisfying the standard validity condition for agreement protocols. We are even able to restrict the validity condition to agreement inside the box of correct data points, while achieving optimal resilience of $t< n/3$. For the asynchronous case, we can adapt all three algorithms to reach the same approximation results (up to a constant factor). Our results suggest that it is reasonable to study the trade-off between validity conditions and the quality of the solution.
翻译:暂无翻译