What makes a talk successful? Is it the content or the presentation? We try to estimate the contribution of the speaker's oratory skills to the talk's success, while ignoring the content of the talk. By oratory skills we refer to facial expressions, motions and gestures, as well as the vocal features. We use TED Talks as our dataset, and measure the success of each talk by its view count. Using this dataset we train a neural network to assess the oratory skills in a talk through three factors: body pose, facial expressions, and acoustic features. Most previous work on automatic evaluation of oratory skills uses hand-crafted expert annotations for both the quality of the talk and for the identification of predefined actions. Unlike prior art, we measure the quality to be equivalent to the view count of the talk as counted by TED, and allow the network to automatically learn the actions, expressions, and sounds that are relevant to the success of a talk. We find that oratory skills alone contribute substantially to the chances of a talk being successful.
翻译:是什么使谈话成功? 是什么内容还是演示? 我们试图评估演讲者口头技能对演讲成功的贡献,同时忽略谈话内容。 我们用口头技巧来指面部表达、动议和姿态以及声势特征。 我们用TED谈话作为我们的数据集,衡量每个演讲的成功程度,用其视图计数来衡量每个演讲的成功程度。 我们利用这个数据集训练一个神经网络,通过三个因素来评估演讲的口头技能:身体姿势、面部表达和声学特征。 以往关于自动评价或实验技能的工作大多使用手工制作的专家说明来说明谈话质量和确定预先确定的行动。 与以前的艺术不同,我们衡量质量与TED所计算到的谈话的数值相等,并允许网络自动学习与演讲成功相关的行动、表达和声音。 我们发现,只有这种口头技巧才能极大地促进谈话成功的机会。