Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers give insight into the development process and likely contain the surprising events of this process. Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result. Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations.
翻译:从信息理论来看,超位是衡量意外事件是如何发生的。统计语言模型提供了自然语言的概率近似值,而且由于超位是随着事件发生的概率而构建的,因此有可能确定与英语判决相关的超位。软件存储器问题跟踪器的问题和拉动请求可以深入了解开发过程,并可能包含这一进程中令人惊讶的事件。目标。先前的工作已经确定软件存储库中的异常事件是开发者感兴趣的,并且使用简单的代码衡量方法来检测这些事件。在本研究中,我们将提出一种新方法,用于在软件存储库中用超常的概率来探测异常事件。由于能够找到出乎意料的问题和拉动请求,我们打算进一步分析它们,以确定它们是否真正在存储器中占据重要位置,或者它们是否构成要解决的重大挑战。如果有可能在早期发现不妙的意外,或者在它们引起更多麻烦之前,那么,努力、成本和时间将作为一种结果被节省。方法。在提取问题并调出在GiHpris存储器上最受欢迎的5000个软件存储器库中发现不寻常的事件探测异常事件的新发现的新方法。我们打算进一步分析它们是否真正具有重要性,那么,我们将用模型来衡量这些方法,我们将用模型来测测测测测测测测出它们的重要性。