悲剧性谈话者:莎士比亚音频和轻便实地数据集,用于视听机学习研究 (Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research)

3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio research primarily offer unimodal content, and when visual data is included, the quality is far from meeting the standard production needs. We present "Tragic Talkers", an audio-visual dataset consisting of excerpts from the "Romeo and Juliet" drama captured with microphone arrays and multiple co-located cameras for light-field video. Tragic Talkers provides ideal content for object-based media (OBM) production. It is designed to cover various conventional talking scenarios, such as monologues, two-people conversations, and interactions with considerable movement and occlusion, yielding 30 sequences captured from a total of 22 different points of view and two 16-element microphone arrays. Additionally, we provide voice activity labels, 2D face bounding boxes for each camera view, 2D pose detection keypoints, 3D tracking data of the mouth of the actors, and dialogue transcriptions. We believe the community will benefit from this dataset as it can assist multidisciplinary research. Possible uses of the dataset are discussed.

翻译：3D视听制作的目的是向消费者提供沉浸和互动的经验。然而,忠实复制真实世界的3D场景仍是一项艰巨的任务,部分原因是缺乏便于朝此方向进行视听研究的现有数据集。在大多数现有的多视图数据集中,相关的音频被忽略。同样,空间音频研究的数据集主要提供单式内容,当包含视觉数据时,质量远未达到标准生产需求。我们展示了由“Romeo和Juliet”戏剧节选集的视听数据集,其中有“Romeo和Juliet”戏剧的节录,并配有麦克风阵列和多个合用相机拍摄光场视频。Tragic Talers为基于目标的媒体制作提供了理想内容。它旨在覆盖各种传统的谈话情景,如独白、两人对话、与大量移动和封闭的互动,共从22个不同的观点点和两个16个组合的麦克风阵列中采集了30个序列。此外,我们提供声频活动标签、2D面面图像跟踪工具,作为每个摄像头的关键对话工具,我们将相信每个数据记录框的语音记录框。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日