数据增强在提升软件工程通信中情感识别性能的应用 (Data Augmentation for Improving Emotion Recognition in Software Engineering Communication) - 专知论文

会员服务 ·

0

软件 · 工具 · 软件工程 · 识别 · 情感识别 ·

Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

翻译：数据增强在提升软件工程通信中情感识别性能的应用

Mia Mohammad Imran,Yashasvi Jain,Preetha Chatterjee,Kostadin Damevski

Emotions (e.g., Joy, Anger) are prevalent in daily software engineering (SE) activities, and are known to be significant indicators of work productivity (e.g., bug fixing efficiency). Recent studies have shown that directly applying general purpose emotion classification tools to SE corpora is not effective. Even within the SE domain, tool performance degrades significantly when trained on one communication channel and evaluated on another (e.g, StackOverflow vs. GitHub comments). Retraining a tool with channel-specific data takes significant effort since manually annotating large datasets of ground truth data is expensive. In this paper, we address this data scarcity problem by automatically creating new training data using a data augmentation technique. Based on an analysis of the types of errors made by popular SE-specific emotion recognition tools, we specifically target our data augmentation strategy in order to improve the performance of emotion recognition. Our results show an average improvement of 9.3% in micro F1-Score for three existing emotion classification tools (ESEM-E, EMTk, SEntiMoji) when trained with our best augmentation strategy.

翻译：情感（如喜悦、愤怒）在日常软件工程活动中普遍存在，且被证实是工作效率（例如缺陷修复效率）的重要指标。近期研究表明，将通用情感分类工具直接应用于软件工程语料库效果不佳。即使在软件工程领域内，当工具在一个通信渠道（如StackOverflow）上训练而在另一个渠道（如GitHub评论）上评估时，其性能也会显著下降。由于人工标注大规模真实数据成本高昂，使用渠道特定数据重新训练工具需要大量投入。本文通过采用数据增强技术自动生成新训练数据，以解决数据稀缺问题。基于对现有软件工程专用情感识别工具错误类型的分析，我们针对性地设计了数据增强策略，以提升情感识别性能。实验结果表明，采用我们最优增强策略训练后，三种现有情感分类工具（ESEM-E、EMTk、SEntiMoji）的微平均F1分数平均提升了9.3%。

0

相关内容

软件（中国大陆及香港用语，台湾作软体，英文：Software）是一系列按照特定顺序组织的计算机数据和指令的集合。一般来讲软件被划分为编程语言、系统软件、应用软件和介于这两者之间的中间件。软件就是程序加文档的集合体。

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

专知会员服务

37+阅读 · 2022年7月12日

【CVPR 2022】基于双噪声标签的可见光-红外人再识别学习，Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification

【CVPR 2022】基于双噪声标签的可见光-红外人再识别学习，Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification

专知会员服务

14+阅读 · 2022年3月28日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【IJCAI2021】User-as-Graph: 基于异构图池化的新闻推荐用户建模

专知会员服务

23+阅读 · 2021年8月25日

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

专知会员服务

19+阅读 · 2021年2月1日

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知

12+阅读 · 2020年10月9日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

视觉识别中的实用鲁棒回归技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于最大相关熵准则的支持向量机模型与算法研究

国家自然科学基金

3+阅读 · 2015年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models

Arxiv

0+阅读 · 12月16日

Dual Cluster Contrastive learning for Object Re-Identification

Arxiv

0+阅读 · 12月11日

The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News

Arxiv

0+阅读 · 12月7日

Unsupervised Time Series Anomaly Prediction with Importance-based Generative Contrastive Learning

Arxiv

0+阅读 · 12月4日

medDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support

Arxiv

0+阅读 · 12月2日

VIP会员

文章信息

相关主题

相关VIP内容

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

专知会员服务

37+阅读 · 2022年7月12日

【CVPR 2022】基于双噪声标签的可见光-红外人再识别学习，Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification

【CVPR 2022】基于双噪声标签的可见光-红外人再识别学习，Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification

专知会员服务

14+阅读 · 2022年3月28日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【IJCAI2021】User-as-Graph: 基于异构图池化的新闻推荐用户建模

专知会员服务

23+阅读 · 2021年8月25日

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

专知会员服务

19+阅读 · 2021年2月1日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知

12+阅读 · 2020年10月9日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

相关论文

Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models

Arxiv

0+阅读 · 12月16日

Dual Cluster Contrastive learning for Object Re-Identification

Arxiv

0+阅读 · 12月11日

The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News

Arxiv

0+阅读 · 12月7日

Unsupervised Time Series Anomaly Prediction with Importance-based Generative Contrastive Learning

Arxiv

0+阅读 · 12月4日

medDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support

Arxiv

0+阅读 · 12月2日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

视觉识别中的实用鲁棒回归技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于最大相关熵准则的支持向量机模型与算法研究

国家自然科学基金

3+阅读 · 2015年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员