We consider information-theoretic bounds on expected generalization error for statistical learning problems in a networked setting. In this setting, there are $K$ nodes, each with its own independent dataset, and the models from each node have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give upper bounds on the expected generalization error for a variety of problems, such as those with Bregman divergence or Lipschitz continuous losses, that demonstrate an improved dependence of $1/K$ on the number of nodes. These "per node" bounds are in terms of the mutual information between the training dataset and the trained weights at each node, and are therefore useful in describing the generalization properties inherent to having communication or privacy constraints at each node.
翻译:我们考虑在网络环境下对统计学习问题的预期一般化错误的信息理论界限。 在这种背景下,有K美元节点,每个节点都有自己的独立数据集,每个节点的模型必须合并成最后的集中模型。 我们既考虑模型的简单平均率,也考虑更为复杂的多轮算法。 我们为各种问题,例如布雷格曼差异或利普施茨持续损失的问题,给出了预期一般化错误的上限,这表明对节点数的依赖性提高了1/K美元。 这些“每个节点”的界限是培训数据集和每个节点经过培训的加权之间的相互信息,因此在描述每个节点通信或隐私限制所固有的一般化特性方面非常有用。