Historic dress artifacts are a valuable source for human studies. In particular, they can provide important insights into the social aspects of their corresponding era. These insights are commonly drawn from garment pictures as well as the accompanying descriptions and are usually stored in a standardized and controlled vocabulary that accurately describes garments and costume items, called the Costume Core Vocabulary. Building an accurate Costume Core from garment descriptions can be challenging because the historic garment items are often donated, and the accompanying descriptions can be based on untrained individuals and use a language common to the period of the items. In this paper, we present an approach to use Natural Language Processing (NLP) to map the free-form text descriptions of the historic items to that of the controlled vocabulary provided by the Costume Core. Despite the limited dataset, we were able to train an NLP model based on the Universal Sentence Encoder to perform this mapping with more than 90% test accuracy for a subset of the Costume Core vocabulary. We describe our methodology, design choices, and development of our approach, and show the feasibility of predicting the Costume Core for unseen descriptions. With more garment descriptions still being curated to be used for training, we expect to have higher accuracy for better generalizability.
翻译:历史服装制品是人类研究的宝贵来源。 特别是,它们可以提供对相应时代的社会方面的重要见解。 这些见解通常取自服装图片和随附描述,并通常储存在一种标准化和控制的词汇中,准确描述服装和服装物品,称为Costume核心词汇。 在服装描述中建立准确的Costume核心可能具有挑战性,因为历史服装制品经常捐赠,而所附描述可以基于未经培训的个人,并使用与项目期间相同的语言。在本文中,我们提出了一个方法,用自然语言处理(NLP)绘制历史物品的免费文本描述,与Costume核心提供的受控词汇的描述相匹配。尽管数据集有限,我们还是能够根据通用句子编码来培训一个NLP模型,用90%以上的测试精度来绘制Costume核心词汇的一部分。我们描述了我们的方法、设计选择和我们方法的发展,并展示了预测Costume核心用于更隐秘描述的可行性。 更多的服装描述仍然用于培训,我们仍可以精确性地进行。