The existing biclustering algorithms for finding feature relation based biclusters often depend on assumptions like monotonicity or linearity. Though a few algorithms overcome this problem by using density-based methods, they tend to miss out many biclusters because they use global criteria for identifying dense regions. The proposed method, RelDenClu uses the local variations in marginal and joint densities for each pair of features to find the subset of observations, which forms the bases of the relation between them. It then finds the set of features connected by a common set of observations, resulting in a bicluster. To show the effectiveness of the proposed methodology, experimentation has been carried out on fifteen types of simulated datasets. Further, it has been applied to six real-life datasets. For three of these real-life datasets, the proposed method is used for unsupervised learning, while for other three real-life datasets it is used as an aid to supervised learning. For all the datasets the performance of the proposed method is compared with that of seven different state-of-the-art algorithms and the proposed algorithm is seen to produce better results. The efficacy of proposed algorithm is also seen by its use on COVID-19 dataset for identifying some features (genetic, demographics and others) that are likely to affect the spread of COVID-19.
翻译:用于查找基于特征关系的双组群的现有双组式算法往往取决于单一度或线性等假设。虽然有些算法通过使用基于密度的方法克服了这一问题,但它们往往会错开许多双组群,因为它们使用全球标准来识别密度大的区域。拟议方法,即RElDenClu使用每种特征边际和联合密度的本地差异来查找作为它们之间关系的基础的一组观测数据;然后发现通过一组共同观测连接起来的一组特征,从而形成一个双组。为显示拟议方法的有效性,已经对15种模拟数据集进行了实验。此外,该方法被应用于六个真实寿命数据集。对于其中三个真实寿命数据集,拟议方法用于未受监督的学习,而其他三个真实寿命数据集则用来协助监督学习。所有拟议方法的性能都与7种不同状态的计算法和拟议的19-19类数据集的性能进行了比较。还看到,对六种模拟数据集的实验已经应用于六个真实寿命数据集。对于三种真实寿命数据集来说,拟议的方法被用于进行不受监督的学习,而其他三种真实性数据集则用来作为监督学习的辅助。所有拟议方法都与7种不同状态的数据集的功能比较,因此,拟议的C- 19-19级算算法也被认为具有了某些可测算结果。