开放源码审查和议题讨论中的不文明检测 (Incivility Detection in Open Source Code Review and Issue Discussions) - 专知论文

会员服务 ·

0

Performer · Learning · 代码 · INFORMS · Machine Learning ·

2022 年 6 月 27 日

Incivility Detection in Open Source Code Review and Issue Discussions

翻译：开放源码审查和议题讨论中的不文明检测

Isabella Ferreira,Ahlaam Rafiq,Jinghui Cheng

from arxiv, 18 pages

Given the democratic nature of open source development, code review and issue discussions may be uncivil. Incivility, defined as features of discussion that convey an unnecessarily disrespectful tone, can have negative consequences to open source communities. To prevent or minimize these negative consequences, open source platforms have included mechanisms for removing uncivil language from the discussions. However, such approaches require manual inspection, which can be overwhelming given the large number of discussions. To help open source communities deal with this problem, in this paper, we aim to compare six classical machine learning models with BERT to detect incivility in open source code review and issue discussions. Furthermore, we assess if adding contextual information improves the models' performance and how well the models perform in a cross-platform setting. We found that BERT performs better than classical machine learning models, with a best F1-score of 0.95. Furthermore, classical machine learning models tend to underperform to detect non-technical and civil discussions. Our results show that adding the contextual information to BERT did not improve its performance and that none of the analyzed classifiers had an outstanding performance in a cross-platform setting. Finally, we provide insights into the tones that the classifiers misclassify.

翻译：鉴于开放源码开发的民主性质,代码审查和问题讨论可能是不文明的。文明被界定为表达不必要不尊重的语调的讨论特点,它可能对开放源码社区产生消极后果。为了防止或尽量减少这些负面后果,开放源码平台包括了将不文明语言从讨论中去除的机制。然而,鉴于讨论次数众多,这些方法需要人工检查,而这种检查可能过于庞大。为了帮助开放源码社区处理这一问题,我们在本文件中将六种古典机器学习模式与BERT比较,以便在开放源码审查和讨论中发现不文明的状态。此外,我们评估增加背景信息是否改善了模型的性能,以及模型在跨平台环境中的性能如何。我们发现,BERT比经典机器学习模式表现更好,而最好的F1核心为0.95。此外,古典机器学习模式往往不完善,无法发现非技术和民间讨论。我们的结果显示,将背景信息添加到BERT,在公开源码审查和发布讨论时,其业绩没有改善,分析分类者在跨平台环境中的杰出表现如何。最后,我们向分类者提供洞察。

0

相关内容

Performer

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

核心蛋白聚糖（decorin）缺失的肿瘤微环境与结直肠癌发生和转移机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

共刺激分子Tim-1和Tim-3对肥大细胞介导的抗弓形虫感染免疫调节机制

国家自然科学基金

0+阅读 · 2014年12月31日

外泌体（Exosome）在小肠上皮损伤修复的作用机制及甘草的干预研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

函数空间中关于积分算子的Wiener引理及有界性的研究

国家自然科学基金

1+阅读 · 2014年12月31日

数据分析中的大规模矩阵优化模型求解算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向GIS的跨媒体复杂数据分析及检索方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数空间与度量测度空间上的分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于分子印迹聚合物固相微萃取－亲水液相色谱分离的氨基糖苷类抗生素残留快速筛查技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

Arxiv

0+阅读 · 2022年8月17日

Role of Data Augmentation in Unsupervised Anomaly Detection

Arxiv

0+阅读 · 2022年8月16日

Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search

Arxiv

0+阅读 · 2022年8月12日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Imbalance Problems in Object Detection: A Review

Arxiv

24+阅读 · 2020年3月11日

A survey on Semi-, Self- and Unsupervised Techniques in Image Classification

A survey on Semi-, Self- and Unsupervised Techniques in Image Classification

Arxiv

100+阅读 · 2020年2月20日

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Arxiv

17+阅读 · 2019年10月9日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

Arxiv

0+阅读 · 2022年8月17日

Role of Data Augmentation in Unsupervised Anomaly Detection

Arxiv

0+阅读 · 2022年8月16日

Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search

Arxiv

0+阅读 · 2022年8月12日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Imbalance Problems in Object Detection: A Review

Arxiv

24+阅读 · 2020年3月11日

A survey on Semi-, Self- and Unsupervised Techniques in Image Classification

A survey on Semi-, Self- and Unsupervised Techniques in Image Classification

Arxiv

100+阅读 · 2020年2月20日

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Arxiv

17+阅读 · 2019年10月9日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

相关基金

核心蛋白聚糖（decorin）缺失的肿瘤微环境与结直肠癌发生和转移机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

共刺激分子Tim-1和Tim-3对肥大细胞介导的抗弓形虫感染免疫调节机制

国家自然科学基金

0+阅读 · 2014年12月31日

外泌体（Exosome）在小肠上皮损伤修复的作用机制及甘草的干预研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

函数空间中关于积分算子的Wiener引理及有界性的研究

国家自然科学基金

1+阅读 · 2014年12月31日

数据分析中的大规模矩阵优化模型求解算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向GIS的跨媒体复杂数据分析及检索方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数空间与度量测度空间上的分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于分子印迹聚合物固相微萃取－亲水液相色谱分离的氨基糖苷类抗生素残留快速筛查技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员