蛋白指数化:用嵌入和集群技术替代复杂的远距离计算 (Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques) - 专知论文

会员服务 ·

0

Learning · 簇 · 向量化 · 相似度 · 维数灾难 ·

2022 年 8 月 18 日

Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques

翻译：蛋白指数化:用嵌入和集群技术替代复杂的远距离计算

Jaroslav Oľha,Terézia Slanináková,Martin Gendiar,Matej Antol,Vlastislav Dohnal

from arxiv, 14 pages, 7 figures, to be published in SISAP 2022

Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and simplicity at the cost of formal guarantees of accuracy and correctness of querying. The authors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps -- (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and (iii) a final filtering step which applies basic vector distance functions to refine the result.

翻译：尽管类似搜索研究不断演变,但它继续面临数据复杂性带来的同样挑战,如维度诅咒和计算成本昂贵的远程功能,各种机器学习技术证明能够以简单的线性功能组合取代精心设计的数学模型,往往以查询准确和正确性的正式保证为代价而速度和简便。作者探讨了这一研究趋势的潜力,为3D蛋白结构搜索这一复杂问题提供了一个轻量级解决方案。解决方案包括三个步骤:(一) 将3D蛋白结构信息转换为非常紧凑的矢量,(二) 使用概率模型将这些矢量分组,并回答询问,归还一定数量的类似对象,以及(三) 采用基本矢量距离功能改进结果的最后过滤步骤。

0

相关内容

Learning

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

sRNA伴侣蛋白Hfq与sRNA RsmY对藤黄绿菌素合成途径转录激活子PltR表达的转录后调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

马尾松高抗旱家系应答干旱胁迫的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥泛素化E3连接酶DRIP1及其互作蛋白在响应水分胁迫应答中的分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

基于2型糖尿病新基因NOS1AP易感位点分子分型的血清蛋白质组和代谢组分析

国家自然科学基金

0+阅读 · 2011年12月31日

基于生理生态的虚拟水稻三维动态建模及可视化表达

国家自然科学基金

0+阅读 · 2009年12月31日

Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Arxiv

0+阅读 · 2022年10月5日

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Arxiv

0+阅读 · 2022年10月2日

A Simple Approach to Automated Spectral Clustering

Arxiv

0+阅读 · 2022年10月1日

DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation

Arxiv

0+阅读 · 2022年9月30日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

《2025财年美陆军转型倡议（ATI）部队结构与组织提案》

【CMU博士论文】分布偏移下的可信机器学习

智能体 EDA 的曙光：自主数字芯片设计综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Arxiv

0+阅读 · 2022年10月5日

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Arxiv

0+阅读 · 2022年10月2日

A Simple Approach to Automated Spectral Clustering

Arxiv

0+阅读 · 2022年10月1日

DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation

Arxiv

0+阅读 · 2022年9月30日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

相关基金

sRNA伴侣蛋白Hfq与sRNA RsmY对藤黄绿菌素合成途径转录激活子PltR表达的转录后调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

马尾松高抗旱家系应答干旱胁迫的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥泛素化E3连接酶DRIP1及其互作蛋白在响应水分胁迫应答中的分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

基于2型糖尿病新基因NOS1AP易感位点分子分型的血清蛋白质组和代谢组分析

国家自然科学基金

0+阅读 · 2011年12月31日

基于生理生态的虚拟水稻三维动态建模及可视化表达

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员