Drug discovery and development is an extremely complex process, with high attrition contributing to the costs of delivering new medicines to patients. Recently, various machine learning approaches have been proposed and investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Among these techniques, it is especially those using Knowledge Graphs that are proving to have considerable promise across a range of tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritisation. In such a knowledge graph-based representation of drug discovery domains, crucial elements including genes, diseases and drugs are represented as entities or vertices, whilst relationships or edges between them indicate some level of interaction. For example, an edge between a disease and drug entity might represent a successful clinical trial, or an edge between two drug entities could indicate a potentially harmful interaction. In order to construct high-quality and ultimately informative knowledge graphs however, suitable data and information is of course required. In this review, we detail publicly available primary data sources containing information suitable for use in constructing various drug discovery focused knowledge graphs. We aim to help guide machine learning and knowledge graph practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. Overall we hope this review will help motivate more machine learning researchers to explore combining knowledge graphs and machine learning to help solve key and emerging questions in the drug discovery domain.
翻译:药物的发现和开发是一个极其复杂的过程,大量减耗是向病人提供新药物的成本。最近,提出了各种机器学习方法,并调查了这些方法,以帮助提高药物发现管道多个阶段的效能和速度。在这些技术中,特别是使用知识图的技术,证明在一系列任务方面有相当大的希望,包括药物重新定位、药物毒性预测和目标基因疾病优先排序。在这种以知识图表为基础的药物发现领域中,包括基因、疾病和药物在内的关键要素被作为实体或顶端,而它们之间的关系或边缘表明某种程度的互动。例如,疾病和药物实体之间的优势可能是成功的临床试验,或者两个药物实体之间的优势可能表明潜在的有害互动。然而,为了建立高质量和最终信息丰富的知识图表,当然需要适当的数据和信息。在本次审查中,我们详细介绍了公开的原始数据源,其中包括适合用于建立各种药物发现重点知识图表的信息。我们的目的是帮助指导对应用新技术进行整合的机器学习和知识图形实践的从业人员,从而帮助将新的技术与新的药物探索领域结合起来。我们也许能够学习新的机器的探索领域。