We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies.
翻译:我们建议以非对称方式从斜体格中抽取。 Minimax 距离措施相当于用单一联系标准建立一个斜体格,确定一个水平函数的具体形式和距离函数。因此,我们将这种方法扩展至任意的斜体格;我们开发了一个通用框架,从不同类型的斜体格、水平函数和距离函数中可以推断出不同的距离尺度。通过适当的嵌入,我们计算出推断距离的矢量代表值,以便让许多数字机器学习算法能够使用这样的距离。然后,为了解决模型选择问题,我们分别研究在解决方案空间和代表空间的不同坦体位距离的组合,并本着深层表达的精神。在第一种方法中,例如对于集群问题,我们根据不同对象的集群标签的一致性,用正和负边边的权重来绘制一个图表。然后,我们用一个有效的关联组合变量来生成最后的集群。然后,为了解决模型选择问题,我们用深层表达空间和空间代表空间的不同坦体位距离和不同特征的组合。在最后一种方法中,我们研究不同距离的顺序和特征的顺序,通过多层结构中,我们最后展示了多种结构结构结构的特征。