In the field of functional genomics, the analysis of gene expression profiles through Machine and Deep Learning is increasingly providing meaningful insight into a number of diseases. The paper proposes a novel algorithm to perform Feature Selection on genomic-scale data, which exploits the reconstruction capabilities of autoencoders and an ad-hoc defined Explainable Artificial Intelligence-based score in order to select the most informative genes for diagnosis, prognosis, and precision medicine. Results of the application on a Chronic Lymphocytic Leukemia dataset evidence the effectiveness of the algorithm, by identifying and suggesting a set of meaningful genes for further medical investigation.
翻译:在功能基因组学领域,通过机器学习和深度学习分析基因表达数据,已经提供了有意义的研究结果。本文提出了一种新颖的算法,利用自编码器的重构能力和一种特制的可解释人工智能评分方法,对基因组规模的数据进行特征选择,以便选出最具信息量的基因以进行诊断、预测和精准医学。在慢性淋巴细胞白血病数据集上应用的结果表明,该算法有效地识别并推荐了一组有意义的基因供进一步的医学研究使用。