DNA sequencing is revolutionising the field of medicine. DNA sequencers, the machines which perform DNA sequencing, have evolved from the size of a fridge to that of a mobile phone over the last two decades. The cost of sequencing a human genome also has reduced from billions of dollars to hundreds of dollars. Despite these improvements, DNA sequencers output hundreds or thousands of gigabytes of data that must be analysed on computers to discover meaningful information with biological implications. Unfortunately, the analysis techniques have not kept the pace with rapidly improving sequencing technologies. Consequently, even today, the process of DNA analysis is performed on high-performance computers, just as it was a couple of decades ago. Such high-performance computers are not portable. Consequently, the full utility of an ultra-portable sequencer for sequencing in-the-field or at the point-of-care is limited by the lack of portable lightweight analytic techniques. This thesis proposes computer architecture-aware optimisation of DNA analysis software. DNA analysis software is inevitably convoluted due to the complexity associated with biological data. Modern computer architectures are also complex. Performing architecture-aware optimisations requires the synergistic use of knowledge from both domains, (i.e, DNA sequence analysis and computer architecture). This thesis aims to draw the two domains together. In this thesis, gold-standard DNA sequence analysis workflows are systematically examined for algorithmic components that cause performance bottlenecks. Identified bottlenecks are resolved through architecture-aware optimisations at different levels, i.e., memory, cache, register and processor. The optimised software tools are used in complete end-to-end analysis workflows and their efficacy is demonstrated by running on prototypical embedded systems.
翻译:DNA测序仪,即进行DNA测序的机器,在过去20年中已经从冰箱的大小演变成移动电话的大小。因此,人类基因组测序的成本也从数十亿美元下降到数百美元。尽管取得了这些改进,DNA测序器输出了数百或数千千兆字节的数据,这些数据必须在计算机上分析,以发现有意义的生物影响信息。不幸的是,分析技术没有跟上快速改进测序技术的速度。因此,即使在今天,DNA分析过程也在高性能计算机上进行,就像几十年前那样。这种高性能的计算机是无法移动的。因此,超便携式测序仪对实地测序或关点测的成本也已经从数十亿美元下降到数百美元。尽管这些改进了,DNA测序器生成了数百千兆字节的数据,但由于缺乏便携式光量分析技术,DNA分析软件的优化也不可避免地与生物数据的复杂有关。 现代的终端结构也非常复杂。 进行结构- 结构- 结构- 系统- 系统- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 和系统- 系统化- 系统- 系统化- 系统- 系统- 系统化- 系统化- 系统化- 系统- 系统- 系统- 系统- 系统- 系统- 和系统- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 系统化- 和系统- 系统- 系统化- 系统化- 系统化- 系统- 和系统化- 系统- 系统- 系统- 系统- 系统- 系统化- 系统化- 系统- 系统化- 系统化- 系统化- 系统化- 系统- 进程- 进程- 和系统化- 系统- 系统化- 系统化- 系统化-系统- 系统- 系统化- 系统-