个性化元数据标注下的屏幕角色语言建模 (Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations) - 专知论文

会员服务 ·

0

语言建模 · 元数据 · 困惑度 · 注释（编程） · 语料库 ·

2023 年 3 月 29 日

Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations

翻译：个性化元数据标注下的屏幕角色语言建模

Sebastian Vincent,Rowanne Sumner,Alice Dowek,Charlotte Blundell,Emily Preston,Chris Bayliss,Chris Oakley,Carolina Scarton

from arxiv, 9 pages; 4 figures; 6 tables. Preprint

Personalisation of language models for dialogue sensitises them to better capture the speaking patterns of people of specific characteristics, and/or in specific environments. However, rich character annotations are difficult to come by and to successfully leverage. In this work, we release and describe a novel set of manual annotations for 863 speakers from the popular Cornell Movie Dialog Corpus, including features like characteristic quotes and character descriptions, and a set of six automatically extracted metadata for over 95% of the featured films. We perform extensive experiments on two corpora and show that such annotations can be effectively used to personalise language models, reducing perplexity by up to 8.5%. Our method can be applied even zero-shot for speakers for whom no prior training data is available, by relying on combinations of characters' demographic characteristics. Since collecting such metadata is costly, we also contribute a cost-benefit analysis to highlight which annotations were most cost-effective relative to the reduction in perplexity.

翻译：---- 个性化的语言建模是为了更好地捕捉特定人群和/或特定环境的讲话模式。然而，获得有效的角色注释并成功地利用对AI研究人员来说是一个难题。在这项工作中，我们发布并描述了一个对流行的康奈尔电影对话语料库中863位发言人进行了标注的新数据集。包括特征引语和角色描述等功能，以及超过95％的电影的六个自动提取的元数据。我们在两个语料库上进行了大量实验，并表明这些注释可以有效地用于个性化语言建模，将困惑度降低了高达8.5％。即使对于没有先前训练数据的发言人，也可以基于角色的人口特征的组合来应用我们的方法，此时我们的方法是零样本的。由于收集此类元数据成本高昂，因此我们还提供了成本效益分析，以突出哪些注释相对于困惑度降低最为具有成本效益。

0

相关内容

语言建模

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

基于复杂语义的个性化图像集摘要研究

国家自然科学基金

0+阅读 · 2015年12月31日

牙周致病菌诱导的调节性B细胞的生成及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

感知情绪与自我体验情绪加工的神经机制：基于fMRI的多变量模式分析

国家自然科学基金

0+阅读 · 2013年12月31日

地理信息服务质量模型及质量评价方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

自体BMSCs源多巴胺神经元移植促进老龄食蟹猴帕金森病模型神经功能重塑的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

会计准则、公司治理与QFII投资

国家自然科学基金

1+阅读 · 2011年12月31日

间充质干细胞克隆清除诱导移植耐受新机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

高动态室内无线环境中渐进式自适应定位方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

HIF-1α23545;结肠癌细胞MDR1基因启动子的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

MeCP2基因及其所在染色体Xq28区域基因序列重复在孤独症发病机制中的作用研究

国家自然科学基金

1+阅读 · 2008年12月31日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月19日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Arxiv

0+阅读 · 2023年5月18日

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Arxiv

0+阅读 · 2023年5月18日

The Web Can Be Your Oyster for Improving Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Distilling Script Knowledge from Large Language Models for Constrained Language Planning

Arxiv

0+阅读 · 2023年5月18日

Leveraging Large Language Models in Conversational Recommender Systems

Arxiv

0+阅读 · 2023年5月16日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

VIP会员

文章信息

相关主题

注释（编程）

相关VIP内容

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月19日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Arxiv

0+阅读 · 2023年5月18日

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Arxiv

0+阅读 · 2023年5月18日

The Web Can Be Your Oyster for Improving Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Distilling Script Knowledge from Large Language Models for Constrained Language Planning

Arxiv

0+阅读 · 2023年5月18日

Leveraging Large Language Models in Conversational Recommender Systems

Arxiv

0+阅读 · 2023年5月16日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

相关基金

基于复杂语义的个性化图像集摘要研究

国家自然科学基金

0+阅读 · 2015年12月31日

牙周致病菌诱导的调节性B细胞的生成及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

感知情绪与自我体验情绪加工的神经机制：基于fMRI的多变量模式分析

国家自然科学基金

0+阅读 · 2013年12月31日

地理信息服务质量模型及质量评价方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

自体BMSCs源多巴胺神经元移植促进老龄食蟹猴帕金森病模型神经功能重塑的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

会计准则、公司治理与QFII投资

国家自然科学基金

1+阅读 · 2011年12月31日

间充质干细胞克隆清除诱导移植耐受新机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

高动态室内无线环境中渐进式自适应定位方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

HIF-1α23545;结肠癌细胞MDR1基因启动子的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

MeCP2基因及其所在染色体Xq28区域基因序列重复在孤独症发病机制中的作用研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员