Biomedical research has revealed the crucial role of miRNAs in the progression of many diseases, and computational prediction methods are increasingly proposed for assisting biological experiments to verify miRNA-disease associations (MDAs). However, the generalizability and explainability are currently underemphasized. It's significant to generalize effective predictions to entities with fewer or no existing MDAs and reveal how the prediction scores are derived. In this study, our work contributes to data, model, and result analysis. First, for better formulation of the MDA issue, we integrate multi-source data into a heterogeneous graph with a broader learning and prediction scope, and we split massive verified MDAs into independent training, validation, and test sets as a benchmark. Second, we construct an end-to-end data-driven model that performs node feature encoding, graph structure learning, and binary prediction sequentially, with a heterogeneous graph transformer as the central module. Finally, computational experiments illustrate that our method outperforms existing state-of-the-art methods, achieving better evaluation metrics and alleviating the neglect of unknown miRNAs and diseases effectively. Case studies further demonstrate that we can make reliable MDA detections on diseases without MDA records, and the predictions can be explained in general and case by case.
翻译:暂无翻译