Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification tasks. Our data-versus-data approach automatically quantifies distinctive differences in distributions in a high-dimensional space via kernel two-sample testing between two sets extracted from multimodal data (e.g., images, sounds, haptic signals). We demonstrate the effectiveness of our technique by benchmarking against expertly engineered classifiers for visual-audio-haptic surface recognition due to the industrial relevance, difficulty, and competitive baselines of this application; ablation studies confirm the utility of key components of our pipeline. As shown in our open-source code, we achieve 97.2% accuracy on a standard multi-user dataset with 108 surface classes, outperforming the state-of-the-art machine-learning algorithm by 6% on a more difficult version of the task. The fact that our classifier obtains this performance with minimal data processing in the standard algorithm setting reinforces the powerful nature of kernel methods for learning to recognize complex patterns.
翻译:通过图像和时间序列接触数据对物理表面进行分类,广泛使用机器学习和深层次学习的方法,通过图像和时间序列接触数据对物理表面进行分类;然而,这些方法依靠人的专门知识,需要耗费时间的数据和参数调控过程;为克服这些挑战,我们建议一个容易实施的框架,直接处理各种数据源,以进行分类任务。我们的数据-反数据数据方法通过内核双抽样测试,自动量化从多式联运数据中提取的两组数据(例如图像、声音、偶然信号)在高维空间的分布上的独特差异。我们通过对专门设计的视觉-听觉-听觉表面识别分类器进行基准衡量,展示了我们技术的有效性。由于这一应用的工业相关性、困难和竞争性基线,我们建议了一个易于执行的框架。我们的数据-反数据方法自动量化了我们管道关键组成部分的实用性。正如我们的开源代码所示,我们用108个表层标准多用户数据集实现了97.2%的准确度,比任务更难的版本的状态机器学习算法高出6%。我们分类者获得这种精确性,通过最强的算法来学习复杂的性质。</s>