Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) recommender systems however hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g., hierarchical, pairwise, or groupings), remains a challenge. We propose an innovative general RS framework, termed Boost-RS, that enhances RS performance by "boosting" embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme-substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors.
翻译:尽管进行了实验和整理努力,但基质上的酶杂交的范围仍然基本上没有探索,也没有记录下来。建议系统(RS)目前尚未为酶-基质互动预测问题探索,这些系统可用于为基质提供酶建议,反之亦然。合作-Filtarting(CF)建议系统的性能取决于用户和项目嵌入矢量的质量(在我们的例子中是酶和基质)。重要的是,加强CF嵌入混杂的辅助数据,特别是关联数据(例如,级、双向或组合)。建议系统(RS)仍然是一项挑战。我们提议了一个创新的RS框架,称为Boost-RS,通过辅助数据“加速”嵌入矢量,提高RS的性能。具体地说,“促进-RS”建议系统的性能受到培训,并动态地调整了多个相关的辅助学习任务“加强-RSB”的性能,利用对比性学习任务来利用关系数据。在对准基质-基质-基值预测模型中显示BOest-RS的效能,我们运用了每个基质-基质-级的Beal-rodual 学习任务。