The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.
翻译:网络档案查询日志:挖掘25年来550个搜索引擎数百万个搜索结果页面的数据
The translated abstract
网络档案查询日志(AQL)是在互联网档案馆收集的以前未使用的全面查询日志,历时25年。它的第一个版本包括3.56亿个查询、1.66亿个搜索结果页面和55家搜索提供商提供的17亿个搜索结果。尽管已经在文献中研究了许多查询日志,但拥有这些数据的搜索提供商通常不会公开日志以保护用户隐私和重要的业务数据。在少数公开可用的查询日志中,没有一个结合了规模、范围和多样性。AQL是第一个这样做的日志,可以进行新的检索模型和(历时)搜索引擎分析的研究。以一种保护隐私的方式提供,它促进了开放式研究以及搜索行业更多的透明度和问责制。