In this work, we propose a multi-armed bandit based framework for identifying a compact set of informative data instances (i.e., the prototypes) that best represents a given target set. Prototypical examples of a given dataset offer interpretable insights into the underlying data distribution and assist in example-based reasoning, thereby influencing every sphere of human decision making. A key challenge is the large-scale setting, in which similarity comparison between pairs of data points needs to be done for almost all possible pairs. We propose to overcome this limitation by employing stochastic greedy search on the space of prototypical examples and multi-armed bandit approach for reducing the number of similarity comparisons. We analyze the total number of similarity comparisons needed by approach and provide an upper bound independent of the size of the target set.
翻译:在这项工作中,我们提出了一个基于多武装的土匪框架,用以确定一套最能代表特定目标集的信息性数据实例(即原型),某个特定数据集的原型实例提供了对基本数据分布的可解释的洞察力,有助于以实例为基础的推理,从而影响人类决策的每个领域。一个关键的挑战在于大规模的环境,在这种环境中,几乎所有可能的对口都需要对数据点的对口进行相似的比较。我们提议通过在模拟实例空间上采用审慎的贪婪搜索和多武装土匪方法来减少相似性比较的次数,从而克服这一限制。我们分析了方法所需的相似性比较的总数,提供了一套目标大小的上限。