We widely use emojis in social networking to heighten, mitigate or negate the sentiment of the text. Emoji suggestions already exist in many cross-platform applications but an emoji is predicted solely based a few prominent words instead of understanding the subject and substance of the text. Through this paper, we showcase the importance of using Twitter features to help the model understand the sentiment involved and hence to predict the most suitable emoji for the text. Hashtags and Application Sources like Android, etc. are two features which we found to be important yet underused in emoji prediction and Twitter sentiment analysis on the whole. To approach this shortcoming and to further understand emoji behavioral patterns, we propose a more balanced dataset by crawling additional Twitter data, including timestamp, hashtags, and application source acting as additional attributes to the tweet. Our data analysis and neural network model performance evaluations depict that using hashtags and application sources as features allows to encode different information and is effective in emoji prediction.
翻译:我们在社交网络中广泛使用emoji来提升、减轻或否定文本的情绪。Emoji建议已经存在于许多跨平台应用程序中,但预计emoji建议只基于几个突出的字,而不是理解文本的主题和内容。我们通过本文展示了使用Twitter功能帮助模型理解所涉情绪从而预测文本最合适的演示的重要性。像Android这样的Hashtags和应用源等,是我们发现在Semoji预测和Twitter情绪分析中重要但没有得到充分利用的两个特征。为了应对这一缺陷并进一步理解emoji行为模式,我们提议通过爬行的更多Twitter数据,包括时间戳、标签和应用源,作为Twitter的附加属性,来建立更加平衡的数据。我们的数据分析和神经网络模型绩效评估显示,使用标签和应用源作为特征可以对不同信息进行编码,对emoji预测有效。