调至或调至或调至比率:制作共同声音手势的调查评价方法 (To Rate or Not To Rate: Investigating Evaluation Methods for Generated Co-Speech Gestures) - 专知论文

会员服务 ·

0

成对型 · 秩 · Machine Learning · 会话智能体 · Performer ·

2021 年 8 月 13 日

To Rate or Not To Rate: Investigating Evaluation Methods for Generated Co-Speech Gestures

翻译：调至或调至或调至比率:制作共同声音手势的调查评价方法

Pieter Wolfert,Jeffrey M. Girard,Taras Kucherenko,Tony Belpaeme

from arxiv, accepted for publication at International Conference for Multimodal Interaction (ICMI'21)

While automatic performance metrics are crucial for machine learning of artificial human-like behaviour, the gold standard for evaluation remains human judgement. The subjective evaluation of artificial human-like behaviour in embodied conversational agents is however expensive and little is known about the quality of the data it returns. Two approaches to subjective evaluation can be largely distinguished, one relying on ratings, the other on pairwise comparisons. In this study we use co-speech gestures to compare the two against each other and answer questions about their appropriateness for evaluation of artificial behaviour. We consider their ability to rate quality, but also aspects pertaining to the effort of use and the time required to collect subjective data. We use crowd sourcing to rate the quality of co-speech gestures in avatars, assessing which method picks up more detail in subjective assessments. We compared gestures generated by three different machine learning models with various level of behavioural quality. We found that both approaches were able to rank the videos according to quality and that the ranking significantly correlated, showing that in terms of quality there is no preference of one method over the other. We also found that pairwise comparisons were slightly faster and came with improved inter-rater reliability, suggesting that for small-scale studies pairwise comparisons are to be favoured over ratings.

翻译：虽然自动业绩衡量标准对于机器学习人造类似行为至关重要,但用于评价的金质标准仍然是人类判断。对体现的谈话代理人中人造类似行为的主观评价虽然费用昂贵,但对数据质量的主观评价却鲜为人知。主观评价的两种方法可以大为区分,一种是依靠评级,另一种是双向比较。在这项研究中,我们使用共同说话的手势来比较两者,并回答关于两者是否适合评价人造行为的问题。我们考虑它们是否有能力评定质量,但也考虑与使用努力和收集主观数据所需时间有关的方面。我们利用众包来评分阿凡达人共同说话手势的质量,评估哪一种方法在主观评估中能更详细地反映细节。我们比较了三种不同机器学习模式和不同程度行为质量的手势。我们发现,这两种方法都能够按照质量对录像进行分级,而排列得相当相近,表明在质量方面,一种方法并不优于另一种方法。我们还发现,对口比较稍快,而且与改进了同级之间的比较是比较。

0

相关内容

成对型

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

斯坦福&谷歌Jeff Dean最新Nature论文：医疗深度学习技术指南

斯坦福&谷歌Jeff Dean最新Nature论文：医疗深度学习技术指南

专知会员服务

58+阅读 · 2019年10月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

A k-mer Based Approach for SARS-CoV-2 Variant Identification

A k-mer Based Approach for SARS-CoV-2 Variant Identification

Arxiv

0+阅读 · 2021年10月12日

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

Arxiv

0+阅读 · 2021年10月12日

AskMe: Joint Individual-level and Community-level Behavior Interaction for Question Recommendation

Arxiv

0+阅读 · 2021年10月11日

已删除

Arxiv

1+阅读 · 2021年10月11日

An evaluation of data augmentation methods for sound scene geotagging

Arxiv

0+阅读 · 2021年10月9日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Layer-wise Analysis of a Self-supervised Speech Representation Model

Arxiv

0+阅读 · 2021年10月9日

AECMOS: A speech quality assessment metric for echo impairment

Arxiv

0+阅读 · 2021年10月8日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Arxiv

5+阅读 · 2017年12月12日

VIP会员

文章信息

相关主题

Machine Learning

会话智能体

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

斯坦福&谷歌Jeff Dean最新Nature论文：医疗深度学习技术指南

斯坦福&谷歌Jeff Dean最新Nature论文：医疗深度学习技术指南

专知会员服务

58+阅读 · 2019年10月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

A k-mer Based Approach for SARS-CoV-2 Variant Identification

A k-mer Based Approach for SARS-CoV-2 Variant Identification

Arxiv

0+阅读 · 2021年10月12日

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

Arxiv

0+阅读 · 2021年10月12日

AskMe: Joint Individual-level and Community-level Behavior Interaction for Question Recommendation

Arxiv

0+阅读 · 2021年10月11日

已删除

Arxiv

1+阅读 · 2021年10月11日

An evaluation of data augmentation methods for sound scene geotagging

Arxiv

0+阅读 · 2021年10月9日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Layer-wise Analysis of a Self-supervised Speech Representation Model

Arxiv

0+阅读 · 2021年10月9日

AECMOS: A speech quality assessment metric for echo impairment

Arxiv

0+阅读 · 2021年10月8日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Arxiv

5+阅读 · 2017年12月12日

微信扫码咨询专知VIP会员