Mass spectrometry, commonly used for protein identification, generates a massive number of spectra that need to be matched against a large database. In reality, most of them remain unidentified or mismatched due to unexpected post-translational modifications. Open modification search (OMS) has been proposed as a strategy to improve the identification rate by considering every possible change in spectra, but it expands the search space exponentially. In this work, we propose HyperOMS, which redesigns OMS based on hyperdimensional computing to cope with such challenges. Unlike existing algorithms that represent spectral data with floating point numbers, HyperOMS encodes them with high dimensional binary vectors and performs the efficient OMS in high-dimensional space. With the massive parallelism and simple boolean operations, HyperOMS can be efficiently handled on parallel computing platforms. Experimental results show that HyperOMS on GPU is up to $17\times$ faster and $6.4\times$ more energy efficient than the state-of-the-art GPU-based OMS tool while providing comparable search quality to competing search tools.
翻译:通常用于蛋白质识别的质谱测定方法产生了大量需要与大型数据库相匹配的光谱。 事实上,大多数光谱仍不明或因翻译后意外的修改而不匹配。 公开修改搜索( OMS) 已被提议为一项战略, 通过考虑光谱中的每一种可能的改变来提高识别率, 但它会使搜索空间以指数化的方式扩大。 在这项工作中, 我们提议HyperOMS, 它将基于高维计算重新设计OMS, 以应对此类挑战。 与代表浮点数的光谱数据的现有算法不同, HyperOMS 将这些数据编码为高维二进矢量矢量, 并在高维空间执行高效的 OMS 。 随着巨大的平行和简单的布林操作, HyperOMS 可以在平行的计算平台上得到高效处理。 实验结果表明, GPUPU上的HyperOMS 速度高达17美元, 6.4 美元, 比基于GPU的OMS 工具的州级的光谱效率更高。