We propose two frameworks to deal with problem settings in which both structured and unstructured data are available. Structured data problems are best solved by traditional machine learning models such as boosting and tree-based algorithms, whereas deep learning has been widely applied to problems dealing with images, text, audio, and other unstructured data sources. However, for the setting in which both structured and unstructured data are accessible, it is not obvious what the best modeling approach is to enhance performance on both data sources simultaneously. Our proposed frameworks allow joint learning on both kinds of data by integrating the paradigms of boosting models and deep neural networks. The first framework, the boosted-feature-vector deep learning network, learns features from the structured data using gradient boosting and combines them with embeddings from unstructured data via a two-branch deep neural network. Secondly, the two-weak-learner boosting framework extends the boosting paradigm to the setting with two input data sources. We present and compare first- and second-order methods of this framework. Our experimental results on both public and real-world datasets show performance gains achieved by the frameworks over selected baselines by magnitudes of 0.1% - 4.7%.
翻译:我们建议了两个框架来处理有结构化和无结构化数据存在的问题设置。结构化数据问题最好通过传统的机器学习模型(如推动和树基算法)来解决,而深层次学习被广泛应用于处理图像、文字、音频和其他非结构化数据源的问题。然而,对于结构化和无结构化数据的无障碍环境而言,尚不明显的最佳建模方法是同时提高两个数据源的性能。我们提议的框架允许通过整合增强模型和深层神经网络的范式,对两类数据进行联合学习。第一个框架,即增强性能-矢量深度学习网络,通过梯度推动从结构化数据中学习特征,并通过双层深层神经网络将其与非结构化数据嵌入结合起来。第二,两边宽度-疏漏液强化框架以两个输入数据源将增强型范式扩展至设置。我们提出并比较了这一框架的一级和二级方法。我们在公共和实体一级数据集的实验结果显示通过选定基准度4.1%的绩效。</s>