基于词嵌入量化日本1900年至1999年间的性别刻板印象演变 (Quantifying Gender Stereotypes in Japan between 1900 and 1999 with Word Embeddings)

We quantify the evolution of gender stereotypes in Japan from 1900 to 1999 using a series of 100 word embeddings, each trained on a corpus from a specific year. We define the gender stereotype value to measure the strength of a word's gender association by computing the difference in cosine similarity of the word to female- versus male-related attribute words. We examine trajectories of gender stereotype across three traditionally gendered domains: Home, Work, and Politics, as well as occupations. The results indicate that language-based gender stereotypes partially evolved to reflect women's increasing participation in the workplace and politics: Work and Politics domains become more strongly female-stereotyped over the years. Yet, Home also became more female-stereotyped, suggesting that women were increasingly viewed as fulfilling multiple roles such as homemakers, workers, and politicians, rather than having one role replace another. Furthermore, the strength of female stereotype for occupations positively correlate with the proportion of women in each occupation, indicating that word-embedding-based measures of gender stereotype mirrored demographic shifts to a considerable extent.

翻译：本研究利用一系列100个词嵌入模型（每个模型基于特定年份的语料库训练），量化了日本从1900年至1999年间性别刻板印象的演变。我们定义了性别刻板印象值，通过计算目标词与女性相关属性词及男性相关属性词的余弦相似度之差，来衡量词语与性别关联的强度。我们考察了三个传统性别化领域（家庭、工作与政治）以及职业领域中性别刻板印象的变迁轨迹。结果表明，基于语言的性别刻板印象部分地反映了女性在职场和政治中参与度的提升：工作与政治领域在多年间逐渐呈现出更强的女性刻板印象。然而，家庭领域也变得更加女性刻板印象化，这表明女性日益被视为承担着多种角色（如家庭主妇、工作者和政治人物），而非一种角色取代另一种角色。此外，职业的女性刻板印象强度与各职业中女性比例呈正相关，这表明基于词嵌入的性别刻板印象测量在很大程度上反映了人口结构的变化。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日