We present ConchShell, a multi-modal generative adversarial framework that takes pictures as input to the network and generates piano music samples that match the picture context. Inspired by I3D, we introduce a novel image feature representation method: time-convolutional neural network (TCNN), which is used to forge features for images in the temporal dimension. Although our image data consists of only six categories, our proposed framework will be innovative and commercially meaningful. The project will provide technical ideas for work such as 3D game voice overs, short-video soundtracks, and real-time generation of metaverse background music.We have also released a new dataset, the Beach-Ocean-Piano Dataset (BOPD) 1, which contains more than 3,000 images and more than 1,500 piano pieces. This dataset will support multimodal image-to-music research.
翻译:我们提出Conchshell(Conchshell),这是一个多模式的基因对抗框架,将图片作为网络输入,并制作与图片背景相匹配的钢琴音乐样本。在I3D的启发下,我们引入了一种新的图像特征描述方法:时间-进化神经网络(TCNN),用于制作时间层面图像的特征。虽然我们的图像数据仅包括六类,但我们提议的框架将具有创新和商业意义。该项目将为3D游戏声音翻版、短视音轨和实时生成元反向背景音乐等工作提供技术构想。我们还发布了一个新的数据集,即海滩-海洋-钢琴数据集1,包含3,000多幅图像和1,500多个钢琴片。这个数据集将支持多式图像到音乐研究。