表示学习是通过利用训练数据来学习得到向量表示,这可以克服人工方法的局限性。 表示学习通常可分为两大类,无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器(如去噪自动编码器和稀疏自动编码器等)中的隐变量作为表示。 目前出现的变分自动编码器能够更好的容忍噪声和异常值。 然而,推断给定数据的潜在结构几乎是不可能的。 目前有一些近似推断的策略。 此外,一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架,该框架使用矩阵分解来保持成对的DTW相似性。 通过学习保持DTW的shaplets,即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息,更好地捕获数据的语义结构。 孪生网络和三元组网络是目前两种比较流行的模型,它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

VIP内容

最近的对比表示学习方法依赖于估计一个上下文的多个视图之间的互信息。例如,我们可以通过应用数据增强获得给定图像的多个视图,或者我们可以将序列分割成包含序列中某个步骤的过去和未来的视图。MI的下界比较容易优化,但当评估大量的MI有强烈的低估偏见。我们提出将完整的MI估计问题分解为一个较小的估计问题。这个表达式包含一个无条件和条件MI项的和,每个测量总的MI的适度块,这有助于通过对比界近似。为了使和最大化,我们给出了条件MI的一个比较下界,它可以有效地逼近。我们将我们的一般方法称为互信息分解估计(DEMI)。我们证明了DEMI可以捕获比标准的非分解对比界在综合设置更大数量的MI,并在视觉域的对话生成学习更好的表示。

https://www.zhuanzhi.ai/paper/8843e06299bf34535700e85e6c684c37

成为VIP会员查看完整内容
0
14

最新论文

The drug discovery and development process is a long and expensive one, costing over 1 billion USD on average per drug and taking 10-15 years. To reduce the high levels of attrition throughout the process, there has been a growing interest in applying machine learning methodologies to various stages of drug discovery process in the recent decade, including at the earliest stage - identification of druggable disease genes. In this paper, we have developed a new tensor factorisation model to predict potential drug targets (i.e.,genes or proteins) for diseases. We created a three dimensional tensor which consists of 1,048 targets, 860 diseases and 230,011 evidence attributes and clinical outcomes connecting them, using data extracted from the Open Targets and PharmaProjects databases. We enriched the data with gene representations learned from a drug discovery-oriented knowledge graph and applied our proposed method to predict the clinical outcomes for unseen target and dis-ease pairs. We designed three evaluation strategies to measure the prediction performance and benchmarked several commonly used machine learning classifiers together with matrix and tensor factorisation methods. The result shows that incorporating knowledge graph embeddings significantly improves the prediction accuracy and that training tensor factorisation alongside a dense neural network outperforms other methods. In summary, our framework combines two actively studied machine learning approaches to disease target identification, tensor factorisation and knowledge graph representation learning, which could be a promising avenue for further exploration in data-driven drug discovery.

0
0
下载
预览
Top