In the field of Natural Language Processing, information extraction from texts has been the objective of many researchers for years. Many different techniques have been applied in order to reveal the opinion that a tweet might have, thus understanding the sentiment of the small writing up to 280 characters. Other than figuring out the sentiment of a tweet, a study can also focus on finding the correlation of the tweets with a certain area of interest, which constitutes the purpose of this study. In order to reveal if an area of interest has a trend in ongoing tweets, we have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores that show the similarity between the daily tweet corpus and the target words representing our area of interest is calculated by using a na\"ive correlation-based technique without training any Machine Learning Model. The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings computed by Multilanguage Universal Sentence Encoder and showed main opinion stream of the tweets with respect to a certain area of interest, which proves that an ongoing trend of a specific subject on Twitter can easily be captured in almost real time by using the proposed methodology in this study. We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results, whereas using word embeddings requires less computational time than sentence embeddings, thus being more effective. This paper will start with an introduction followed by the background information about the basics, then continue with the explanation of the proposed methodology and later on finish by interpreting the results and concluding the findings.
翻译:在自然语言处理领域,从文本中提取信息是许多研究人员多年来的目标。许多不同的技术应用了一种易于应用的自动化方法,即每日平均相似评分显示每日推文内容和表示我们兴趣领域的目标词之间的相似性,其计算方法是在不培训任何机器学习模式的情况下使用“基于相关关系的”技术。除了了解推文的情绪外,一项研究还可以侧重于找到推文与某个感兴趣的领域的相关性,这是本研究的目的。为了揭示某个感兴趣的领域是否具有持续推文的趋势,我们建议采用一种易于应用的自动化方法,即每日平均相似评分显示每日推文和代表我们兴趣领域的目标词之间的相似性,通过使用一种基于直观的“基于关联的技术”来计算我们的兴趣领域,而无需培训任何机器学习模式。《日均相似度评分》主要基于多种语言通用评分的相似性和文字/感应嵌入某个领域,并显示与某个感兴趣的领域有关推文的主要意见流,这证明,在几乎实时了解Twitter上的一个特定主题的趋势时,可以使用这种基于目前拟议采用的基本推文解释,在采用这种基于“基于“基于“基于“基于”和“基于“基于”的结论”的方法计算方法计算结果,而我们几乎通过在采用“采用“采用“结果”的推算方法”的推算方法,因此在较慢地进行计算方法,我们采用一种比较后,我们采用“通过采用“采用“结果”的推算方法”的推算方法,在较后的推算方法,因此将使得采用较后的推算方法,在较后的推后的推后的推算方法,我们使用较后的推的推后的推后的推后的推后的推的推的推的推的推后的推的推的推后的推后的推的推后的推的推算方法,我们用方法将结果,我们还将使得将使得将使得将使得将使得将使得将使得将使得将使得将使得将使结果和推算方法对结果和推算方法将使得将使得将使得将使得将使得的推算方法将使得将使得将使得的推。