A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.
翻译:然而,由于许多社团缺乏个人数据,这类社团参与者的人口信息在很大程度上没有得到充分探讨,在这项工作中,我们分析了韩国国家韩国语言研究所(韩语学院)为描述不同人口(年龄和性别)群体参与该社团的特点而建立的朝鲜全国每日对话材料。