Recently, deep neural network models have achieved promising results in image captioning task. Yet, "vanilla" sentences, only describing shallow appearances (e.g., types, colors), generated by current works are not satisfied netizen style resulting in lacking engagements, contexts, and user intentions. To tackle this problem, we propose Netizen Style Commenting (NSC), to automatically generate characteristic comments to a user-contributed fashion photo. We are devoted to modulating the comments in a vivid "netizen" style which reflects the culture in a designated social community and hopes to facilitate more engagement with users. In this work, we design a novel framework that consists of three major components: (1) We construct a large-scale clothing dataset named NetiLook, which contains 300K posts (photos) with 5M comments to discover netizen-style comments. (2) We propose three unique measures to estimate the diversity of comments. (3) We bring diversity by marrying topic models with neural networks to make up the insufficiency of conventional image captioning works. Experimenting over Flickr30k and our NetiLook datasets, we demonstrate our proposed approaches benefit fashion photo commenting and improve image captioning tasks both in accuracy and diversity.
翻译:最近,深心神经网络模型在图像说明任务中取得了令人乐观的成果。然而,“香香”句只描述由当前作品产生的浅表(如类型、颜色),并不满足于网络化风格,导致缺乏接触、背景和用户意图。为了解决这一问题,我们提议Netizen风格评论(NSC)自动为用户贡献的时装照片生成特质评论。我们致力于以清晰的“网络化”风格调整评论,反映指定的社会社区的文化,并希望促进与用户的更多接触。在这项工作中,我们设计了一个由三个主要部分组成的新颖框架:(1) 我们建造了一个名为NetiLook的大型服装数据集,里面有300K(照片)个海报和5M评论,以发现网络化评论。(2) 我们提出了三种独特的措施来估计评论的多样性。(3) 我们通过将主题模型与神经化网络结合,以弥补传统图像说明工作的不足。我们实验了Flick30k和我们的NetIook数据集,我们展示了我们拟议采用的方法改进图像的准确性和解释方式。