In this paper, we present distributed fault-tolerant algorithms that approximate the centroid of a set of n data points in $\mathbb{R}^d$. Our work falls into the broader area of approximate multidimensional Byzantine agreement. The standard approach used in existing algorithms is to agree on a vector inside the convex hull of all correct vectors. This strategy dismisses many possibly correct data points. As a result, the algorithm does not necessarily agree on a representative value. In fact, this does not allow us to compute a better approximation than $2d$ of the centroid in the synchronous case. To find better approximation algorithms for the centroid, we investigate the trade-off between the quality of the approximation, the resilience of the algorithm, and the validity of the solution. For the synchronous case, we show that it is possible to achieve a $1$-approximation of the centroid with up to $t<n/(d+1)$ Byzantine data points. This approach however does not give any guarantee on the validity of the solution. Therefore, we develop a second approach that reaches a $2\sqrt{d}$-approximation of the centroid, while satisfying the standard validity condition for agreement protocols. We are even able to restrict the validity condition to agreement inside the box of correct data points, while achieving optimal resilience of $t< n/3$. For the asynchronous case, we can adapt all three algorithms to reach the same approximation results (up to a constant factor). Our results suggest that it is reasonable to study the trade-off between validity conditions and the quality of the solution.
翻译:暂无翻译