AutoPipeline:利用强化学习和搜索,合成数据管道副目标 (AutoPipeline: Synthesize Data Pipelines By-Target Using Reinforcement Learning and Search)

Recent work has made significant progress in helping users to automate single data preparation steps, such as string-transformations and table-manipulation operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators. We propose a novel "by-target" paradigm that allows users to easily specify the desired pipeline, which is a significant departure from the traditional by-example paradigm. Using by-target, users would provide input tables (e.g., csv or json files), and point us to a "target table" (e.g., an existing database table or BI dashboard) to demonstrate how the output from the desired pipeline would schematically "look like". While the problem is seemingly underspecified, our unique insight is that implicit table constraints such as FDs and keys can be exploited to significantly constrain the space to make the problem tractable. We develop an Auto-Pipeline system that learns to synthesize pipelines using reinforcement learning and search. Experiments on large numbers of real pipelines crawled from GitHub suggest that Auto-Pipeline can successfully synthesize 60-70% of these complex pipelines (up to 10 steps) in 10-20 seconds on average.

翻译：最近的工作在帮助用户实现单项数据编制步骤自动化方面取得了显著进展,例如字符串转换和表控操作员(例如,JING、GroupBy、Pivot等)。我们在此工作中建议通过将复杂的数据管道与字符串转换和表控操作员合并,将多个此类步骤的端端到端自动化。我们提出了一个新的“逐目标”模式,使用户能够轻松地指定所需的管道,这与传统的旁观模式大为背离。用户将使用目标提供输入表(例如,Csv或json文件),并指示我们“目标表格”(例如,现有数据库表格或BI仪表),以显示如何用字符串转换和表控管操作器的“示意性”。虽然问题似乎未得到充分描述,但我们独特的洞察到,可以利用FD和键等隐含的表格限制来大大限制空间,使问题可定位。我们开发了一个自动到20的输入表系统(例如,csv或json文件),将我们指向一个“目标表格”(例如,现有的数据库表格表格表格表格表格表格表格表或BIIL),以便通过学习10号的大型平均速度将GIAULULLLA学习,从而成功的10的10号进行模拟。我们可以成功。我们在10号上学习。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日