We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI's GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3 embedding and manifold reduction. Corpora are then created to fine-tune GPT-2 models to explore latent information via prompt-based queries. These tools allow researchers and practitioners to gain valuable insights into population subgroups online. Source code at https://github.com/pgfeldman/KeywordExplorer
翻译:我们开发了一套使用大语言模型的Python应用软件,用于确定和分析来自社交媒体平台的与受关注人群相关的数据。我们的管道从使用OpenAI的GPT-3开始,为确定目标人群的相关文本内容生成潜在的关键词。然后对关键词进行验证,并利用GPT-3嵌入和多功能缩减下载和分析内容。然后创建Corpora,对GPT-2模型进行微调,以通过快速查询探索潜在信息。这些工具使研究人员和从业人员能够在网上获得对人口分组的宝贵了解。源代码见https://github.com/pgfeldman/KeywordExplorer。源代码见https://gthub.com/pgfeldman/KeywordExtralerer。