Recent emergence of high-throughput drug screening assays sparkled an intensive development of machine learning methods, including models for prediction of sensitivity of cancer cell lines to anti-cancer drugs, as well as methods for generation of potential drug candidates. However, a concept of generation of compounds with specific properties and simultaneous modeling of their efficacy against cancer cell lines has not been comprehensively explored. To address this need, we present VADEERS, a Variational Autoencoder-based Drug Efficacy Estimation Recommender System. The generation of compounds is performed by a novel variational autoencoder with a semi-supervised Gaussian Mixture Model (GMM) prior. The prior defines a clustering in the latent space, where the clusters are associated with specific drug properties. In addition, VADEERS is equipped with a cell line autoencoder and a sensitivity prediction network. The model combines data for SMILES string representations of anti-cancer drugs, their inhibition profiles against a panel of protein kinases, cell lines biological features and measurements of the sensitivity of the cell lines to the drugs. The evaluated variants of VADEERS achieve a high r=0.87 Pearson correlation between true and predicted drug sensitivity estimates. We train the GMM prior in such a way that the clusters in the latent space correspond to a pre-computed clustering of the drugs by their inhibitory profiles. We show that the learned latent representations and new generated data points accurately reflect the given clustering. In summary, VADEERS offers a comprehensive model of drugs and cell lines properties and relationships between them, as well as a guided generation of novel compounds.
翻译:最近出现的高通量药物筛选分析引发了机器学习方法的密集发展,包括预测癌症细胞线对抗癌症药物敏感度的模型,以及潜在药物候选者的生成方法。然而,尚未全面探讨产生具有特定特性的化合物的概念和对癌症细胞线的功效同时建模的概念。为解决这一需要,我们提供了VADEERS,一个基于药物增益的动态自动coder药效估计建议系统。化合物的生成是通过一种新型变异自动电离电解码器进行,配有半监督高斯混合药物的半监督电解码模型(GMMM),以及潜在药物对象的生成方法。此外,VADEERS配有一条细胞线自动电解码和敏感预测网络。模型综合了SMILES抗癌药物串装图的数据,它们与蛋白质粘固的直径直径直线生物特征以及测量细胞线对药物的敏感度。我们评估了GADEERS的精确度和直径阵列的精确度数据结构。我们评估了VADE的精确度和直径阵列的精确度数据结构。