In this work, we explore recurrent neural network architectures for tuberculosis (TB) cough classification. In contrast to previous unsuccessful attempts to implement deep architectures in this domain, we show that a basic bidirectional long short-term memory network (BiLSTM) can achieve improved performance. In addition, we show that by performing greedy feature selection in conjunction with a newly-proposed attention-based architecture that learns patient invariant features, substantially better generalisation can be achieved compared to a baseline and other considered architectures. Furthermore, this attention mechanism allows an inspection of the temporal regions of the audio signal considered to be important for classification to be performed. Finally, we develop a neural style transfer technique to infer idealised inputs which can subsequently be analysed. We find distinct differences between the idealised power spectra of TB and non-TB coughs, which provide clues about the origin of the features in the audio signal.
翻译:在这项工作中,我们探索了肺结核咳嗽分类的经常性神经网络结构。与以前试图在这一领域实施深层结构的努力失败相比,我们表明基本的双向长期短期记忆网络(BILSTM)可以提高性能。此外,我们表明,通过与新提出的关注型结构(学习病人的无差异特征)一起进行贪婪性特征选择,可以比基线和其他考虑型结构大大更好地实现概括化。此外,这一关注机制允许检查被认为对进行分类很重要的音频信号的时间区域。最后,我们开发了一种神经风格传输技术,以推断理想化的投入,随后可以加以分析。我们发现肺结核和非肺结核病毒的理想能量谱与非肺结核咳病的理想能量谱之间存在明显差异,这为音频信号中特征的起源提供了线索。