Google Trends is a tool that allows researchers to analyze the popularity of Google search queries across time and space. In a single request, users can obtain time series for up to 5 queries on a common scale, normalized to the range from 0 to 100 and rounded to integer precision. Despite the overall value of Google Trends, rounding causes major problems, to the extent that entirely uninformative, all-zero time series may be returned for unpopular queries when requested together with more popular queries. We address this issue by proposing Google Trends Anchor Bank (G-TAB), an efficient solution for the calibration of Google Trends data. Our method expresses the popularity of an arbitrary number of queries on a common scale without being compromised by rounding errors. The method proceeds in two phases. In the offline preprocessing phase, an "anchor bank" is constructed, a set of queries spanning the full spectrum of popularity, all calibrated against a common reference query by carefully chaining together multiple Google Trends requests. In the online deployment phase, any given search query is calibrated by performing an efficient binary search in the anchor bank. Each search step requires one Google Trends request, but few steps suffice, as we demonstrate in an empirical evaluation. We make our code publicly available as an easy-to-use library at https://github.com/epfl-dlab/GoogleTrendsAnchorBank.
翻译:谷歌趋势是一个工具,使研究人员能够分析谷歌搜索查询在时间和空间间广度的普及程度。 在一个单一的请求中,用户可以获得最多5个通用查询的时间序列, 普通化为0至100, 整整整精确度不等。 尽管谷歌趋势的总体价值是0至100之间的, 四舍五入为整数的精确度。 尽管“ 谷歌趋势” 的总值, 四舍五入造成了重大问题, 以致于在完全没有信息规范的情况下, 将所有零时间序列都返回到不受欢迎的查询中, 以及更受欢迎的查询。 我们提出谷歌趋势数据库( G- TAB) 的高效校准数据校准。 我们的方法是, 在主机库中进行高效的二进制搜索, 我们的搜索步骤需要一份Google States的任意数量的查询, 而在离线前处理阶段, 建立一个“ 锁定银行” 库, 覆盖全域的查询, 与共同的查询相校准。 在在线部署阶段, 任何给搜索查询的校准是通过在主机库中进行高效的二进搜索。 每个搜索步骤都要求一份Google Streal- bas- a exliglemental as a as as pilling as eximliver as as as as as as as as as as as as.