The stream-based Max-Min diversification problem concerns the task of selecting a limited number of diverse instances from a data stream. The nature of the problem demands immediate and irrevocable decisions. The set-wise diversity to be maximized is the minimum distance among any pair of the selected instances. Standard algorithmic approaches for sequential selection disregard the possibility of selection failures, which is the situation where the last instances of the stream are picked by default to prevent having an incomplete selection. This defect can be catastrophic for the Max-Min diversification objective. In this paper we present the Failure Rate Minimization (FRM) algorithm that allows the selection of a set of disparate instances while reducing significantly the probability of having failures. This is achieved by means of both analytical and empirical techniques. FRM is put in comparison with relevant algorithms from the literature through simulations on real datasets, where we demonstrate its efficiency and low time complexity.
翻译:以串流为基础的 Max-Min 多样化问题涉及从数据流中选择数量有限的不同实例的任务。问题的性质要求立即作出不可撤销的决定。要尽量扩大的组合多样性是任何一对选定实例之间的最小距离。顺序选择的标准算法忽视了选择失败的可能性,即流的最后几例是默认选择,以防止选择不完全。这一缺陷可能对Max-Min多样化目标具有灾难性影响。在本文中,我们介绍了允许选择一组不同实例同时显著降低失败概率的低效率算法(FRM ) 。这是通过分析和实验技术实现的。 FRM通过模拟真实数据集,与文献中的相关算法进行比较,在那里我们展示其效率和低时间复杂性。