Automatically understanding the contents of an image is a highly relevant problem in practice. In e-commerce and social media settings, for example, a common problem is to automatically categorize user-provided pictures. Nowadays, a standard approach is to fine-tune pre-trained image models with application-specific data. Besides images, organizations however often also collect collaborative signals in the context of their application, in particular how users interacted with the provided online content, e.g., in forms of viewing, rating, or tagging. Such signals are commonly used for item recommendation, typically by deriving latent user and item representations from the data. In this work, we show that such collaborative information can be leveraged to improve the classification process of new images. Specifically, we propose a multitask learning framework, where the auxiliary task is to reconstruct collaborative latent item representations. A series of experiments on datasets from e-commerce and social media demonstrates that considering collaborative signals helps to significantly improve the performance of the main task of image classification by up to 9.1%.
翻译:自动理解图像的内容是实践中一个高度相关的问题。例如,在电子商务和社交媒体环境中,一个共同的问题是自动对用户提供的图片进行分类。如今,一个标准的做法是用应用程序特定的数据微调预先训练的图像模型。除了图像外,各组织还经常在应用过程中收集合作信号,特别是用户如何与提供的在线内容互动,例如以查看、评级或标记等形式进行互动。这类信号通常用于项目建议,通常通过从数据中获取潜在用户和项目表示方式。在这项工作中,我们表明可以利用这种合作信息来改进新图像的分类过程。具体地说,我们提出一个多任务学习框架,其辅助任务是重建协作性的潜在项目表示方式。一系列关于电子商务和社会媒体数据集的实验表明,考虑合作信号有助于大大改进图像分类主要任务的业绩,达到9.1%。