多式联运准自动回归:预测新时装产品的视觉受欢迎程度 (Multimodal Quasi-AutoRegression: Forecasting the visual popularity of new fashion products)

Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multi-modal multi-layer perceptron processing categorical, visual and textual features of the product and (2) a quasi-autoregressive neural network modelling the "target" time series of the product's attributes along with the "exogenous" time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products' unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g manually written texts). We employ the product's target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large scale image fashion datasets, Mallzee and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalisability to other domains. A comparative study on the VISUELLE dataset, shows that MuQAR is capable of competing and surpassing the domain's current state of the art by 4.65% and 4.8% in terms of WAPE and MAE respectively.

翻译：估算消费者的偏好对于时装行业至关重要,因为适当利用这一信息对时装行业有利。由于时装行业变化的快速步伐,时装趋势探测是一项具有挑战性的任务。此外,预测新服装设计的视觉受欢迎程度,由于缺乏历史数据,因此要求更难看到新服装设计的视觉受欢迎程度。为此,我们提议采用多式快速快速自制深层学习结构,将两个模块结合起来:(1) 多式多式多层透镜处理产品直观、视觉和文字特征,以及(2) 准上式神经网络模拟产品属性的“目标”时间序列以及所有其他属性的“外型”时间序列。我们使用计算机视野、图像分类和图像说明,自动提取新产品图像的视觉特征和文字描述。产品设计最初以视觉形式表示,这些特征通过要求更多投入(例如手动书面文本)来反映产品“目标”直观、直观”内线网络网络模拟E的“目标”时间序列,我们利用产品“目标“目标”时间序列来评估S的内值数据,同时进行大规模时间序列分析,我们利用S-时间序列数据采集数据,同时将S-imalalalal数据作为S的缩缩缩缩缩数据分析。