Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or years. The resulting new datasets can be analyzed as a single source, which can be flexibly accessed through many data portals. However, such portals offer little guidance to explore the data in-depth or query data with user-customized needs. As a result, it is still challenging for social scientists to efficiently identify related data for their studies and evaluate their theoretical models based on the sliced data. To overcome them, in the Survey Data Recycling (SDR) international cooperation research project, we propose SDRQuerier and apply it to the harmonized SDR database, which features over two million respondents interviewed in a total of 1,721 national surveys that are part of 22 well-known international projects. We design the SDRQuerier to solve three practical challenges that social scientists routinely face. First, a BERT-based model provides customized data queries through research questions or keywords. Second, we propose a new visual design to showcase the availability of the harmonized data at different levels, thus helping users decide if empirical data exist to address a given research question. Lastly, SDRQuerier discloses the underlying relational patterns among substantive and methodological variables in the database, to help social scientists rigorously evaluate or even improve their regression models. Through case studies with multiple social scientists in solving their daily challenges, we demonstrated the novelty, effectiveness of SDRQuerier.
翻译:公共舆论调查是从比较角度研究人民态度和行为的有力工具,然而,即使世界范围的调查也只能提供部分地理和时间覆盖面,从而妨碍全面的知识生产。为了扩大比较范围,社会科学家转而将涵盖类似主题、但不同人口和/或不同年份的数据集变量事后统一起来。由此产生的新数据集可以作为一个单一来源加以分析,可通过许多数据门户灵活访问。然而,这些门户对于根据用户定制的需要探索深度或查询数据提供了很少的指导。因此,社会科学家仍然难以高效率地查明相关数据,用于他们的研究,并评估基于切片数据的理论模型。为了克服这些差异,在调查数据再循环(SDR)国际合作项目中,我们提出SDRQuerier,并将其应用到统一的SDR数据库中,该数据库共有1,721个受访者通过许多著名的国际项目访问。我们设计了SDRQuerierr, 解决社会科学家经常面临的三个实际挑战。我们设计了SDRU, 帮助社会科学家们有效地评估三个实际挑战。首先,一个基于BER-BER的模型通过研究模式,通过不同的研究模式,从而决定了社会数据的标准数据库的更新数据,从而决定了社会数据库的披露。