深程:探索深学习对诱虫入险的实效 (DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging)

For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, with the bug title and description as the input, mapping it to one of the available developers (classes). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data noisy. The existing bag-of-words (BOW) feature models do not consider the syntactical and sequential word information available in the unstructured text. We propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based bug representation is then used for training the classifier. Using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (~70% bugs in an open source bug tracking system) are leveraged, which were completely ignored in the previous studies. Another contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium (383,104 bug reports), Mozilla Core (314,388 bug reports), and Mozilla Firefox (162,307 bug reports). Experimentally we compare our approach with BOW model and machine learning approaches and observe that DBRNN-A provides a higher rank-10 average accuracy.

翻译：对于给定的软件错误报告, 确定可能修正错误的适当开发者是错误三角进程的主要任务 162 。大多数错误跟踪系统中都存在错误标题( 摘要) 和详细描述。自动错误三角算法可以作为一个分类问题来制定。错误标题和描述作为输入, 将它映射为可用的开发者之一( 类) 。主要的挑战是, 错误描述通常包含自由的无结构文本、代码错误、使输入数据吵起来的堆积追踪的组合。现有的字包( BOW) 功能模型并不考虑在非结构化文本中可用的合成和顺序词信息。我们建议使用基于深度双向经常性神经神经网络( DBRNN- A) 的新的错误报告, 错误描述描述通常包含自由的无结构文本文本、代码错误错误错误错误和错误表示方法, 然后用于对分类器进行训练。使用注意机制, 进行 Mo- 错误错误和序列跟踪报告, 使模型错误报告能够进行大规模学习数据序列。