We as software developers or researchers very often get stacktrace error messages while we are trying to write some code or install some packages. Many times these error messages are very obscure and verbose; do not make much sense to us. There is a good chance that someone else has also faced similar issues probably shared similar stacktrace in various online developers' forums. However traditional google searches or other search engines are not very helpful to find web pages with similar stacktraces. In order to address this problem, we have developed a web interface; a better search engine: as an outcome of this research project where users can find appropriate stack overflow posts by submitting the whole stacktrace error message. The current developed solution can serve real-time parallel user queries with top-matched stack overflow posts within 50 seconds using a server with 300GB RAM. This study provides a comprehensive overview of the NLP techniques used in this study and an extensive overview of the research pipeline. This comprehensive result, limitations, and computational overhead mentioned in this study can be used by future researchers and software developers to build a better solution for this same problem or similar large-scale text matching-related tasks.
翻译:作为软件开发者或研究人员,我们常常在试图写出某些代码或安装某些软件包时获得堆叠追踪错误信息。 许多时候,这些错误信息非常模糊和含糊; 对我们来说没有多大意义。 其他人也面临类似的问题, 很可能在各种在线开发者论坛中分享类似的堆叠图。 但是传统的谷歌搜索或其他搜索引擎对于找到拥有类似堆叠图的网页并不十分有用。 为了解决这一问题, 我们开发了一个网络界面; 一个更好的搜索引擎: 作为这个研究项目的结果, 用户可以通过提交整堆叠图错误信息找到合适的堆叠溢出点。 目前开发的解决方案可以在50秒内用一个带有300GB RAM 的服务器对顶层堆叠溢出点进行实时平行用户查询。 此研究提供了本研究中使用的 NLP 技术的全面概览, 以及研究管道的广泛概览。 未来的研究人员和软件开发者可以使用这一综合的结果、 限制 和计算间接结果, 来为同一问题或类似的大规模文本匹配相关任务构建更好的解决方案。