Binning is applied to categorize data values or to see distributions of data. Existing binning algorithms often rely on statistical properties of data. However, there are semantic considerations for selecting appropriate binning schemes. Surveys, for instance, gather respondent data for demographic-related questions such as age, salary, number of employees, etc., that are bucketed into defined semantic categories. In this paper, we leverage common semantic categories from survey data and Tableau Public visualizations to identify a set of semantic binning categories. We employ these semantic binning categories in OSCAR: a method for automatically selecting bins based on the inferred semantic type of the field. We conducted a crowdsourced study with 120 participants to better understand user preferences for bins generated by OSCAR vs. binning provided in Tableau. We find that maps and histograms using binned values generated by OSCAR are preferred by users as compared to binning schemes based purely on the statistical properties of the data.
翻译:Binning 用于对数据值进行分类或查看数据分布。 现有的binning 算法通常依靠数据的统计特性。 但是,在选择适当的宾馆计划时,存在语义考虑。 例如,调查收集人口相关问题的应答者数据,如年龄、工资、雇员人数等,这些数据被归入定义的语义类别。 在本文中,我们利用调查数据和公共可视化表格中常见的语义分类来确定一套语义宾馆类别。 我们在OSCAR中使用了这些语义宾馆类别:一种基于字段推断语义类型的自动选择书包的方法。 我们进行了由120名参与者组成的多方源研究,以更好地了解用户对OSCAR与表au提供的宾馆生成的书包的偏好。 我们发现,使用OSCAR生成的宾点值的地图和直方图被用户偏好于纯粹基于数据统计属性的宾馆计划。