Sentence Compression is the task of generating a shorter, yet grammatical version of a given sentence, preserving the essence of the original sentence. This paper proposes a Black-Box Optimizer for Compression (B-BOC): given a black-box compression algorithm and assuming not all sentences need be compressed -- find the best candidates for compression in order to maximize both compression rate and quality. Given a required compression ratio, we consider two scenarios: (i) single-sentence compression, and (ii) sentences-sequence compression. In the first scenario, our optimizer is trained to predict how well each sentence could be compressed while meeting the specified ratio requirement. In the latter, the desired compression ratio is applied to a sequence of sentences (e.g., a paragraph) as a whole, rather than on each individual sentence. To achieve that, we use B-BOC to assign an optimal compression ratio to each sentence, then cast it as a Knapsack problem, which we solve using bounded dynamic programming. We evaluate B-BOC on both scenarios on three datasets, demonstrating that our optimizer improves both accuracy and Rouge-F1-score compared to direct application of other compression algorithms.
翻译:句子压缩是生成一个较短但语法版的给定句的任务, 保存原句的精髓。 本文建议使用黑盒压缩算法, 假设并非所有句子都需要压缩 -- -- 找到最佳压缩对象, 以便最大限度地实现压缩率和质量。 根据要求的压缩比率, 我们考虑两种设想方案 :( 一) 单项暂停, 和 (二) 句尾压缩 。 在第一个设想方案中, 我们的优化者接受培训, 以预测每个句子在满足特定比率要求的同时能够压缩的好坏处。 在后一个设想中, 想要的压缩比率应用到一个句子的顺序( 例如, 一段), 而不是每个单项句子。 为了实现这一点, 我们使用 B- BOC 来给每个句子指定一个最佳压缩率, 然后将它作为一个 Knapsack 问题, 我们用约束的动态程序来解决 。 我们用三个数据集来评估 B- BOC 的两种假设方案, 显示我们最优化的精确度和红色- F1 核心 直接对比其他的图像应用 。