CONDA:用于在游戏中了解和检测毒性的双重附加说明的数据集 (CONDA: a CONtextual Dual-Annotated dataset for in-game toxicity understanding and detection) - 专知论文

会员服务 ·

0

可理解性 · NLU · 数据集 · MoDELS · 稳健性 ·

2021 年 7 月 23 日

CONDA: a CONtextual Dual-Annotated dataset for in-game toxicity understanding and detection

翻译：CONDA:用于在游戏中了解和检测毒性的双重附加说明的数据集

Henry Weld,Guanghao Huang,Jean Lee,Tongshu Zhang,Kunze Wang,Xinghong Guo,Siqu Long,Josiah Poon,Soyeon Caren Han

from arxiv, Accepted by ACL-IJCNLP 2021

Traditional toxicity detection models have focused on the single utterance level without deeper understanding of context. We introduce CONDA, a new dataset for in-game toxic language detection enabling joint intent classification and slot filling analysis, which is the core task of Natural Language Understanding (NLU). The dataset consists of 45K utterances from 12K conversations from the chat logs of 1.9K completed Dota 2 matches. We propose a robust dual semantic-level toxicity framework, which handles utterance and token-level patterns, and rich contextual chatting history. Accompanying the dataset is a thorough in-game toxicity analysis, which provides comprehensive understanding of context at utterance, token, and dual levels. Inspired by NLU, we also apply its metrics to the toxicity detection tasks for assessing toxicity and game-specific aspects. We evaluate strong NLU models on CONDA, providing fine-grained results for different intent classes and slot classes. Furthermore, we examine the coverage of toxicity nature in our dataset by comparing it with other toxicity datasets.

翻译：传统毒性检测模型侧重于单一语句水平,而没有更深入地理解上下文。我们引入了CONDA,这是一个用于游戏中毒性语言检测的新的数据集,可以进行联合意图分类和空缺填充分析,这是自然语言理解的核心任务。该数据集由来自1.9K已完成的Dota 2匹配聊天日志12K对话的45K语句组成。我们提出了一个强有力的双语级毒性框架,处理语句和象征性模式,以及丰富的背景聊天历史。该数据集是一场彻底的游戏中的毒性分析,全面了解语句、符号和双层的背景。在NLU的启发下,我们还运用其测量毒性的度量来评估毒性和游戏特有方面。我们评估了CONDA的强力NLU模型,为不同的意向类别和档级提供了精确的结果。此外,我们通过将这些数据与其他毒性数据集进行比较,审视了我们数据集中的毒性特性的覆盖范围。

0

相关内容

可理解性

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

AutoML与轻量模型大列表

AutoML与轻量模型大列表

专知

8+阅读 · 2019年4月29日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Arxiv

6+阅读 · 2021年4月26日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

Combination of Multiple Global Descriptors for Image Retrieval

Combination of Multiple Global Descriptors for Image Retrieval

Arxiv

3+阅读 · 2019年4月18日

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Arxiv

3+阅读 · 2019年2月28日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

LSTD: A Low-Shot Transfer Detector for Object Detection

Arxiv

4+阅读 · 2018年3月5日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Arxiv

3+阅读 · 2017年8月3日

VIP会员

文章信息

相关主题

相关VIP内容

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

AutoML与轻量模型大列表

AutoML与轻量模型大列表

专知

8+阅读 · 2019年4月29日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Arxiv

6+阅读 · 2021年4月26日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

Combination of Multiple Global Descriptors for Image Retrieval

Combination of Multiple Global Descriptors for Image Retrieval

Arxiv

3+阅读 · 2019年4月18日

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Arxiv

3+阅读 · 2019年2月28日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

LSTD: A Low-Shot Transfer Detector for Object Detection

Arxiv

4+阅读 · 2018年3月5日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Arxiv

3+阅读 · 2017年8月3日

微信扫码咨询专知VIP会员