We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. We provide insights into the performance guarantees of the greedy algorithms by analyzing special classes of the problem. Finally, we validate the theoretical results using numerical examples, and show that the greedy algorithms work well in practice.
翻译:我们研究了巴伊西亚学习中的一个根本问题,即我们的目标是以最低成本选择一组数据源,同时根据选定数据源提供的数据流实现一定的学习性能。 首先,我们表明巴伊西亚学习的数据源选择问题是NP-hard。 然后我们表明数据源选择问题可以转化为亚模块的一个例子,涵盖文献研究的问题,并提供标准的贪婪算法,用可验证的性能保证解决数据源选择问题。 其次,我们提出快速贪婪算法,改善标准贪婪算法的运行时间,同时实现与标准贪婪算法类似的性能保障。我们通过分析问题的特殊类别,对贪婪算法的性能保障提供洞察力。最后,我们用数字实例验证理论结果,并表明贪婪算法在实践中效果良好。