ViralVectors: 小巧且可扩展的基于无需比对的细胞病毒特征生成 (ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation) - 专知论文

会员服务 ·

0

病毒 · 特征生成 · 序列 · 比对 · 感兴趣区域 ·

2023 年 4 月 6 日

ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

翻译：ViralVectors: 小巧且可扩展的基于无需比对的细胞病毒特征生成

Sarwan Ali,Prakash Chourasia,Zahra Tayebi,Babatunde Bello,Murray Patterson

from arxiv, 24 pages, 5 figures, accepted to Springer Medical & Biological Engineering & Computing

The amount of sequencing data for SARS-CoV-2 is several orders of magnitude larger than any virus. This will continue to grow geometrically for SARS-CoV-2, and other viruses, as many countries heavily finance genomic surveillance efforts. Hence, we need methods for processing large amounts of sequence data to allow for effective yet timely decision-making. Such data will come from heterogeneous sources: aligned, unaligned, or even unassembled raw nucleotide or amino acid sequencing reads pertaining to the whole genome or regions (e.g., spike) of interest. In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis. Such generation is based on \emph{minimizers}, a type of lightweight "signature" of a sequence, used traditionally in assembly and read mapping -- to our knowledge, the first use minimizers in this way. We validate our approach on different types of sequencing data: (a) 2.5M SARS-CoV-2 spike sequences (to show scalability); (b) 3K Coronaviridae spike sequences (to show robustness to more genomic variability); and (c) 4K raw WGS reads sets taken from nasal-swab PCR tests (to show the ability to process unassembled reads). Our results show that ViralVectors outperforms current benchmarks in most classification and clustering tasks.

翻译：SARS-CoV-2的测序数据数量比其他任何病毒都大，这个量将随着许多国家积极投资基因组监测努力而呈几何级数增长。因此，我们需要一种处理大量序列数据的方法，以便进行有效但及时的决策。这些数据将来自于各种异质性来源：整体基因组或感兴趣区域（如刺突蛋白）的对齐、未对齐或甚至未组装的原始核苷酸或氨基酸测序读数。在这项工作中，我们提出了 ViralVectors，它是从 virome 测序数据中生成紧凑特征向量，可进行有效的下游分析。这种生成基于 minimizers，一种轻量级的序列“签名”，传统上用于装配和读取映射——据我们所知，这是首次将 minimizers 用于这种方式。我们使用不同类型的测序数据对我们的方法进行验证：（a）250 万份SARS-CoV-2刺突蛋白序列（以显示其可扩展性）；（b）3K份冠状病毒科刺突蛋白序列（以显示其对更多基因组变异的稳健性）；和（c）来自鼻拭子 PCR 检测的4K份原始全基因组测序读。我们的结果表明，ViralVectors 在大多数分类和聚类任务中优于当前的基准。

0

相关内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【上海交大】可解释CNN的对象分类，Interpretable CNNs for Object Classification

专知会员服务

54+阅读 · 2020年3月14日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

专知会员服务

33+阅读 · 2019年11月28日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】Trifo-VIO：使用点和线的稳健且高效的双目视觉惯导里程计

【泡泡一分钟】Trifo-VIO：使用点和线的稳健且高效的双目视觉惯导里程计

泡泡机器人SLAM

13+阅读 · 2018年12月20日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

基于光微流激光阵列的快速准确DNA分析与筛查

国家自然科学基金

0+阅读 · 2014年12月31日

细胞自噬在伪狂犬病毒复制感染中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

慢乙肝重症化中肠道微生态失衡分子机制的宏蛋白组与代谢组学研究

国家自然科学基金

0+阅读 · 2013年12月31日

啤酒花矮化类病毒不同变体致病性差异的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

用转座子介导的体细胞突变小鼠模型研究肿瘤克隆进化

国家自然科学基金

0+阅读 · 2012年12月31日

量子点标记基于功能化磷脂的溶瘤病毒及其与肿瘤细胞相互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

建立抗疱疹病毒中草药提取物的AN高通量分子筛选模型

国家自然科学基金

0+阅读 · 2010年12月31日

基于核酸适体与纳米金的超灵敏快速检测方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

泛宿主性病毒表位的筛选及差异比对

国家自然科学基金

0+阅读 · 2008年12月31日

Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning Architectures

Arxiv

0+阅读 · 2023年5月24日

FedZero: Leveraging Renewable Excess Energy in Federated Learning

Arxiv

0+阅读 · 2023年5月24日

Madvex: Instrumentation-based Adversarial Attacks on Machine Learning Malware Detection

Arxiv

0+阅读 · 2023年5月24日

Change Point Detection for High-dimensional Linear Models: A General Tail-adaptive Approach

Arxiv

0+阅读 · 2023年5月24日

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Arxiv

0+阅读 · 2023年5月22日

Further Decimating the Inductive Programming Search Space with Instruction Digrams

Arxiv

0+阅读 · 2023年5月22日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

VIP会员

文章信息

相关主题

感兴趣区域

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【上海交大】可解释CNN的对象分类，Interpretable CNNs for Object Classification

专知会员服务

54+阅读 · 2020年3月14日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

专知会员服务

33+阅读 · 2019年11月28日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】Trifo-VIO：使用点和线的稳健且高效的双目视觉惯导里程计

【泡泡一分钟】Trifo-VIO：使用点和线的稳健且高效的双目视觉惯导里程计

泡泡机器人SLAM

13+阅读 · 2018年12月20日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning Architectures

Arxiv

0+阅读 · 2023年5月24日

FedZero: Leveraging Renewable Excess Energy in Federated Learning

Arxiv

0+阅读 · 2023年5月24日

Madvex: Instrumentation-based Adversarial Attacks on Machine Learning Malware Detection

Arxiv

0+阅读 · 2023年5月24日

Change Point Detection for High-dimensional Linear Models: A General Tail-adaptive Approach

Arxiv

0+阅读 · 2023年5月24日

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Arxiv

0+阅读 · 2023年5月22日

Further Decimating the Inductive Programming Search Space with Instruction Digrams

Arxiv

0+阅读 · 2023年5月22日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

相关基金

基于光微流激光阵列的快速准确DNA分析与筛查

国家自然科学基金

0+阅读 · 2014年12月31日

细胞自噬在伪狂犬病毒复制感染中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

慢乙肝重症化中肠道微生态失衡分子机制的宏蛋白组与代谢组学研究

国家自然科学基金

0+阅读 · 2013年12月31日

啤酒花矮化类病毒不同变体致病性差异的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

用转座子介导的体细胞突变小鼠模型研究肿瘤克隆进化

国家自然科学基金

0+阅读 · 2012年12月31日

量子点标记基于功能化磷脂的溶瘤病毒及其与肿瘤细胞相互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

建立抗疱疹病毒中草药提取物的AN高通量分子筛选模型

国家自然科学基金

0+阅读 · 2010年12月31日

基于核酸适体与纳米金的超灵敏快速检测方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

泛宿主性病毒表位的筛选及差异比对

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员