The large volumes of structured data currently available, from Web tables to open-data portals and enterprise data, open up new opportunities for progress in answering many important scientific, societal, and business questions. However, finding relevant data is difficult. While search engines have addressed this problem for Web documents, there are many new challenges involved in supporting the discovery of structured data. We demonstrate how the Auctus dataset search engine addresses some of these challenges. We describe the system architecture and how users can explore datasets through a rich set of queries. We also present case studies which show how Auctus supports data augmentation to improve machine learning models as well as to enrich analytics.
翻译:从网络表格到开放数据门户和企业数据,目前有大量结构化数据,从网络表格到开放数据门户和企业数据,为在回答许多重要的科学、社会和商业问题方面取得进展开辟了新的机会。然而,很难找到相关数据。虽然搜索引擎已经解决了网络文件的这一问题,但在支持发现结构化数据方面存在着许多新的挑战。我们展示了结构数据集搜索引擎如何应对其中一些挑战。我们描述了系统架构,以及用户如何通过一套丰富的查询来探索数据集。我们还介绍了一些案例研究,这些案例研究表明“结构”如何支持数据扩充,以改进机器学习模型并丰富分析。