Biclustering is a data mining technique which searches for local patterns in numeric tabular data with main application in bioinformatics. This technique has shown promise in multiple areas, including development of biomarkers for cancer, disease subtype identification, or gene-drug interactions among others. In this paper we introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia, a modern highly parallelizable programming language for data science. We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems. We hope that this open source software in a high-level programming language will foster research in this promising field of bioinformatics and expedite development of new biclustering methods for big data.
翻译:生物集群是一种数据挖掘技术,它寻求数字表格数据中的当地模式,主要应用于生物信息学。这一技术在多个领域显示出希望,包括开发癌症生物标志、疾病亚型识别或基因药物相互作用等。在本文中,我们引入了EBIC.JL,这是在Julia实施最准确的双集群算法之一,这是数据科学一种现代高度平行的编程语言。我们表明,新版本与其前身EBIC保持了可比的准确性,同时为大多数问题汇集得更快。我们希望,这种高层次编程语言的开放源软件将促进这一有前途的生物信息学领域的研究,并加速开发新的大数据双集群方法。