自动化的自认技术债务追踪：在提交级别上的一种语言无关方法 (Automated Self-Admitted Technical Debt Tracking at Commit-Level: A Language-independent Approach) - 专知论文

会员服务 ·

0

软件系统 · 工具 · 软件 · 操作 · Apache ·

2023 年 4 月 16 日

Automated Self-Admitted Technical Debt Tracking at Commit-Level: A Language-independent Approach

翻译：自动化的自认技术债务追踪：在提交级别上的一种语言无关方法

Mohammad Sadegh Sheikhaei,Yuan Tian

from arxiv, This study has been accepted at: The 6th International Conference on Technical Debt (TechDebt 2023)

Software and systems traceability is essential for downstream tasks such as data-driven software analysis and intelligent tool development. However, despite the increasing attention to mining and understanding technical debt in software systems, specific tools for supporting the track of technical debts are rarely available. In this work, we propose the first programming language-independent tracking tool for self-admitted technical debt (SATD) -- a sub-optimal solution that is explicitly annotated by developers in software systems. Our approach takes a git repository as input and returns a list of SATDs with their evolution actions (created, deleted, updated) at the commit-level. Our approach also returns a line number indicating the latest starting position of the corresponding SATD in the system. Our SATD tracking approach first identifies an initial set of raw SATDs (which only have created and deleted actions) by detecting and tracking SATDs in commits' hunks, leveraging a state-of-the-art language-independent SATD detection approach. Then it calculates a context-based matching score between pairs of deleted and created raw SATDs in the same commits to identify SATD update actions. The results of our preliminary study on Apache Tomcat and Apache Ant show that our tracking tool can achieve a F1 score of 92.8% and 96.7% respectively.

翻译：软件和系统的可追溯性对于下游任务（例如数据驱动的软件分析和智能工具开发）至关重要。然而，尽管越来越多的关注被投入到挖掘和了解软件系统中的技术债务，但特定于支持技术债务追踪的工具很少可用。在本文中，我们提出了第一个编程语言无关的自认技术债务（SATD）追踪工具——一种由开发者在软件系统中明确标注的次优解决方案。我们的方法将git存储库作为输入，并返回SATD的列表，以及它们在提交级别上的演变操作（创建、删除、更新）。我们的方法还返回一个行号，表示对应SATD在系统中的最新起始位置。我们的SATD追踪方法首先通过在提交的代码片段中检测和追踪SATD，利用最先进的语言无关SATD检测方法，识别一个初始的原始SATD集合（只有创建和删除操作）。然后，在同一提交中计算被删除和创建的原始SATD对之间的基于上下文的匹配得分，以识别SATD的更新操作。我们在Apache Tomcat和Apache Ant上进行的初步研究结果表明，我们的追踪工具可以分别达到92.8%和96.7%的F1分数。

0

相关内容

软件系统

【MIT Sam Hopkins】如何读论文？How to Read a Paper

【MIT Sam Hopkins】如何读论文？How to Read a Paper

专知会员服务

108+阅读 · 2022年3月20日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

清华 NLP 团队推荐：必读的77篇机器阅读理解论文

清华 NLP 团队推荐：必读的77篇机器阅读理解论文

专知

20+阅读 · 2018年11月1日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

E003调控滑膜细胞凋亡干预CIA大鼠发病的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于网络聊天的服务中心的建模与运作管理

国家自然科学基金

1+阅读 · 2013年12月31日

新疆维吾尔族孕妇膳食营养、体内叶酸水平、MTHFR基因多态性对出生缺陷影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信任演进的异地分布式信息系统开发团队敏捷性研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

云通信中密文访问控制的可撤销技术

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

可编程嵌入式系统形式化建模与自动验证技术的研究

国家自然科学基金

0+阅读 · 2009年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于Multi-agent的分布式房地产开发企业风险管理体系研究

国家自然科学基金

1+阅读 · 2008年12月31日

A Double Auction for Charging Scheduling among Vehicles Using DAG-Blockchains

Arxiv

0+阅读 · 2023年6月1日

An adaptive multi-fidelity sampling framework for safety analysis of connected and automated vehicles

Arxiv

0+阅读 · 2023年5月31日

Red Teaming Language Model Detectors with Language Models

Arxiv

0+阅读 · 2023年5月31日

Implicit Neural Spatial Representations for Time-dependent PDEs

Arxiv

0+阅读 · 2023年5月31日

Instrumental genesis through interdisciplinary collaboration -- reflections on the emergence of a visualisation framework for video annotation data

Arxiv

0+阅读 · 2023年5月30日

A Survey on Automated Driving System Testing: Landscapes and Trends

Arxiv

12+阅读 · 2022年6月13日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

VIP会员

文章信息

相关主题

相关VIP内容

【MIT Sam Hopkins】如何读论文？How to Read a Paper

【MIT Sam Hopkins】如何读论文？How to Read a Paper

专知会员服务

108+阅读 · 2022年3月20日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

清华 NLP 团队推荐：必读的77篇机器阅读理解论文

清华 NLP 团队推荐：必读的77篇机器阅读理解论文

专知

20+阅读 · 2018年11月1日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

A Double Auction for Charging Scheduling among Vehicles Using DAG-Blockchains

Arxiv

0+阅读 · 2023年6月1日

An adaptive multi-fidelity sampling framework for safety analysis of connected and automated vehicles

Arxiv

0+阅读 · 2023年5月31日

Red Teaming Language Model Detectors with Language Models

Arxiv

0+阅读 · 2023年5月31日

Implicit Neural Spatial Representations for Time-dependent PDEs

Arxiv

0+阅读 · 2023年5月31日

Instrumental genesis through interdisciplinary collaboration -- reflections on the emergence of a visualisation framework for video annotation data

Arxiv

0+阅读 · 2023年5月30日

A Survey on Automated Driving System Testing: Landscapes and Trends

Arxiv

12+阅读 · 2022年6月13日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

相关基金

E003调控滑膜细胞凋亡干预CIA大鼠发病的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于网络聊天的服务中心的建模与运作管理

国家自然科学基金

1+阅读 · 2013年12月31日

新疆维吾尔族孕妇膳食营养、体内叶酸水平、MTHFR基因多态性对出生缺陷影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信任演进的异地分布式信息系统开发团队敏捷性研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

云通信中密文访问控制的可撤销技术

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

可编程嵌入式系统形式化建模与自动验证技术的研究

国家自然科学基金

0+阅读 · 2009年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于Multi-agent的分布式房地产开发企业风险管理体系研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员