This paper introduces a novel Bayesian method for measuring the degree of association between categorical variables. The method is grounded in the formal definition of variable independence and was implemented using MCMC techniques. Unlike existing methods, this approach does not assume prior knowledge of the total number of occurrences for any category, making it particularly well-suited for applications like sentiment analysis. We applied the method to a dataset comprising 4,613 tweets written in Portuguese, each annotated for 30 possibly overlapping emotional categories. Through this analysis, we identified pairs of emotions that exhibit associations and mutually exclusive pairs. Furthermore, the method identifies hierarchical relations between categories, a feature observed in our data, and was used to cluster emotions into basic level groups.
翻译:暂无翻译