Often times, input data may ostensibly conform to a given input format, but cannot be parsed by a conforming program, for instance, due to human error or data corruption. In such cases, a data engineer is tasked with input repair, i.e., she has to manually repair the corrupt data such that it follows a given format, and hence can be processed by the conforming program. Such manual repair can be time-consuming and error-prone. In particular, input repair is challenging without an input specification (e.g., input grammar) or program analysis. In this work, we show that incorporating lightweight failure feedback (e.g., input incompleteness) to parsers is sufficient to repair any corrupt input data with maximal closeness to the semantics of the input data. We propose an approach (called FSYNTH) that leverages lightweight error-feedback and input synthesis to repair invalid inputs. FSYNTH is grammar-agnostic, and it does not require program analysis. Given a conforming program, and any invalid input, FSYNTH provides a set of repairs prioritized by the distance of the repair from the original input. We evaluate FSYNTH on 806 (real-world) invalid inputs using four well-known input formats, namely INI, TinyC, SExp, and cJSON. In our evaluation, we found that FSYNTH recovers 91% of valid input data. FSYNTH is also highly effective and efficient in input repair: It repairs 77% of invalid inputs within four minutes. It is up to 35% more effective than DDMax, the previously best-known approach. Overall, our approach addresses several limitations of DDMax, both in terms of what it can repair, as well as in terms of the set of repairs offered.
翻译:通常情况下, 输入数据表面上可能符合给定的输入格式, 但无法被符合的程序解析, 例如由于人为错误或数据腐败。 在这种情况下, 数据工程师的任务是进行输入修复, 即她必须手动修复腐败数据, 以便按照给定格式, 从而可以通过符合程序处理 。 这种手工修复可能耗时且容易出错 。 特别是, 没有输入规范( 例如, 输入语法) 或程序分析, 输入修复就具有挑战性 。 在这项工作中, 我们显示, 将轻量故障反馈( 例如, 输入不完整) 纳入剖析器足以修复任何腐败的输入数据( 例如, 输入不完善) 。 我们提出了一个方法( 称为 FSYNTH) 的轻量错误反馈和输入合成合成, 它可以用语法的语法分析, 并且不需要程序分析。 在符合要求的程序中, 和任何无效输入, FSYNTH( 输入法) 提供一套最准确的修复方法, 也就是用最准确的输入格式, 我们的STH 4 的缩、 正确输入格式的缩缩的缩的缩的缩缩的缩缩缩缩的缩的缩的缩的缩的缩。