Multimodal learning has attracted the interest of the machine learning community due to its great potential in a variety of applications. To help achieve this potential, we propose a multimodal benchmark MuG with eight datasets allowing researchers to test the multimodal perceptron capabilities of their models. These datasets are collected from four different genres of games that cover tabular, textual, and visual modalities. We conduct multi-aspect data analysis to provide insights into the benchmark, including label balance ratios, percentages of missing features, distributions of data within each modality, and the correlations between labels and input modalities. We further present experimental results obtained by several state-of-the-art unimodal classifiers and multimodal classifiers, which demonstrate the challenging and multimodal-dependent properties of the benchmark. MuG is released at https://github.com/lujiaying/MUG-Bench with the data, documents, tutorials, and implemented baselines. Extensions of MuG are welcomed to facilitate the progress of research in multimodal learning problems.
翻译:多式学习因其在各种应用方面的巨大潜力而吸引了机器学习界的兴趣。为了帮助实现这一潜力,我们建议采用一个多式基准模格,配有八个数据集,使研究人员能够测试其模型的多式透视能力。这些数据集是从四个不同的游戏类型收集的,包括表格、文字和视觉模式。我们进行多层数据分析,以提供对基准的洞察力,包括标签平衡比率、缺失特征的百分比、每个模式内数据分布以及标签和输入模式之间的相互关系。我们进一步介绍了几个最先进的单式分类和多式分类方法获得的实验结果,这些实验结果显示了基准具有挑战性和依赖多式联运的特点。MuG是在https://github.com/lujiaying/MUG-Bench发布的数据、文件、教义和执行的基线。欢迎扩大模组,以促进多式学习问题研究的进展。