Classifications organize entities into categories that identify similarities within a category and discern dissimilarities among categories, and they powerfully classify information in support of analysis. We propose a new classification scheme premised on the reality of imperfect data. Our computational model uses uncertain data envelopment analysis to define a classification's proximity to equitable efficiency, which is an aggregate measure of intra-similarity within a classification's categories. Our classification process has two overriding computational challenges, those being a loss of convexity and a combinatorially explosive search space. We overcome the first by establishing lower and upper bounds on the proximity value, and then by searching this range with a first-order algorithm. We overcome the second by adapting the p-median problem to initiate our exploration, and by then employing an iterative neighborhood search to finalize a classification. We conclude by classifying the thirty stocks in the Dow Jones Industrial average into performant tiers and by classifying prostate treatments into clinically effectual categories.
翻译:将各个实体分类为类别内的相似之处,并辨别不同类别之间的不同之处,它们以分析为依据对信息进行有力的分类。我们根据不完善数据的现实提出新的分类办法。我们的计算模型使用不确定的数据包分析来界定分类接近公平效率的情况,这是对分类类别中不同类别内不同程度的综合衡量。我们的分类过程有两个压倒一切的计算挑战,即凝结性损失和组合爆炸搜索空间。我们克服了第一个挑战,在相近值上下限,然后用第一级算法搜索这一范围。我们克服第二个挑战,我们调整了p-中间问题,以启动我们的勘探,然后用迭接的邻里搜索来完成分类。我们的结论是,将道琼斯工业平均的30个种群分类为性能级,并将准国家治疗分类为临床效果类。