KmerCo: A lightweight K-mer counting technique with a tiny memory footprint - 专知论文

会员服务 ·

0

Processing（编程语言） · 可约的 · Bloom filter · state-of-the-art · Performer ·

2023 年 4 月 28 日

KmerCo: A lightweight K-mer counting technique with a tiny memory footprint

翻译：暂无翻译

Sabuzima Nayak,Ripon Patgiri

from arxiv, Submitted to the conference for possible publication

K-mer counting is a requisite process for DNA assembly because it speeds up its overall process. The frequency of K-mers is used for estimating the parameters of DNA assembly, error correction, etc. The process also provides a list of district K-mers which assist in searching large databases and reducing the size of de Bruijn graphs. Nonetheless, K-mer counting is a data and compute-intensive process. Hence, it is crucial to implement a lightweight data structure that occupies low memory but does fast processing of K-mers. We proposed a lightweight K-mer counting technique, called KmerCo that implements a potent counting Bloom Filter variant, called countBF. KmerCo has two phases: insertion and classification. The insertion phase inserts all K-mers into countBF and determines distinct K-mers. The classification phase is responsible for the classification of distinct K-mers into trustworthy and erroneous K-mers based on a user-provided threshold value. We also proposed a novel benchmark performance metric. We used the Hadoop MapReduce program to determine the frequency of K-mers. We have conducted rigorous experiments to prove the dominion of KmerCo compared to state-of-the-art K-mer counting techniques. The experiments are conducted using DNA sequences of four organisms. The datasets are pruned to generate four different size datasets. KmerCo is compared with Squeakr, BFCounter, and Jellyfish. KmerCo took the lowest memory, highest number of insertions per second, and a positive trustworthy rate as compared with the three above-mentioned methods.

翻译：暂无翻译

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

精子介导转基因过程中外源基因整合机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

三维非线性磁流体力学的自适应有限元方法

国家自然科学基金

0+阅读 · 2014年12月31日

面向人脸检测的大规模异构并行Adaboost机器学习算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

间歇式高功率脉冲运行下充电机暂态热管理问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

转录因子OsbZIPC调控水稻粒形的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

一种重组双功能蛋白治疗动脉粥样硬化的效应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多铁性材料在纳米尺度上的结构和调控

国家自然科学基金

0+阅读 · 2012年12月31日

基于有限带宽基函数的高阶方法

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

高频振动场下橡胶复合材料的填料-动态生热-阻尼相关性研究

国家自然科学基金

0+阅读 · 2011年12月31日

A proof of the Etzion-Silberstein conjecture for monotone and MDS-constructible Ferrers diagrams

Arxiv

0+阅读 · 2023年6月28日

LeCo: Lightweight Compression via Learning Serial Correlations

Arxiv

0+阅读 · 2023年6月27日

Subspace Recycling for Sequences of Shifted Systems with Applications in Image Recovery

Arxiv

0+阅读 · 2023年6月26日

Cumulative Memory Lower Bounds for Randomized and Quantum Computation

Arxiv

0+阅读 · 2023年6月26日

From Shapley Value to Model Counting and Back

Arxiv

0+阅读 · 2023年6月25日

Adaptive Privacy Composition for Accuracy-first Mechanisms

Arxiv

0+阅读 · 2023年6月24日

Mass, momentum and energy preserving FEEC and broken-FEEC schemes for the incompressible Navier-Stokes equations

Arxiv

0+阅读 · 2023年6月23日

Neural Network Pruning for Real-time Polyp Segmentation

Arxiv

0+阅读 · 2023年6月22日

Conditional Local Convolution for Spatio-temporal Meteorological Forecasting

Arxiv

10+阅读 · 2021年12月2日

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Arxiv

13+阅读 · 2018年9月6日

VIP会员

文章信息

相关主题

Processing（编程语言）

state-of-the-art

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

A proof of the Etzion-Silberstein conjecture for monotone and MDS-constructible Ferrers diagrams

Arxiv

0+阅读 · 2023年6月28日

LeCo: Lightweight Compression via Learning Serial Correlations

Arxiv

0+阅读 · 2023年6月27日

Subspace Recycling for Sequences of Shifted Systems with Applications in Image Recovery

Arxiv

0+阅读 · 2023年6月26日

Cumulative Memory Lower Bounds for Randomized and Quantum Computation

Arxiv

0+阅读 · 2023年6月26日

From Shapley Value to Model Counting and Back

Arxiv

0+阅读 · 2023年6月25日

Adaptive Privacy Composition for Accuracy-first Mechanisms

Arxiv

0+阅读 · 2023年6月24日

Mass, momentum and energy preserving FEEC and broken-FEEC schemes for the incompressible Navier-Stokes equations

Arxiv

0+阅读 · 2023年6月23日

Neural Network Pruning for Real-time Polyp Segmentation

Arxiv

0+阅读 · 2023年6月22日

Conditional Local Convolution for Spatio-temporal Meteorological Forecasting

Arxiv

10+阅读 · 2021年12月2日

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Arxiv

13+阅读 · 2018年9月6日

相关基金

精子介导转基因过程中外源基因整合机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

三维非线性磁流体力学的自适应有限元方法

国家自然科学基金

0+阅读 · 2014年12月31日

面向人脸检测的大规模异构并行Adaboost机器学习算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

间歇式高功率脉冲运行下充电机暂态热管理问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

转录因子OsbZIPC调控水稻粒形的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

一种重组双功能蛋白治疗动脉粥样硬化的效应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多铁性材料在纳米尺度上的结构和调控

国家自然科学基金

0+阅读 · 2012年12月31日

基于有限带宽基函数的高阶方法

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

高频振动场下橡胶复合材料的填料-动态生热-阻尼相关性研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员