We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME$_\text{Petersen}$ and ME$_\text{CAPTURE}$, which retrieve a single-valued assessment, and ME$_\text{Schnabel}$ which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.
翻译:我们提出一套衡量标准,以评估从生态中广泛使用的人口估计方法中产生的语言生成情况。更具体地说,我们使用过去几十年中应用的记分和最大相似方法来估计野生封闭人口的规模。我们提出了三种新的衡量标准:取回单一价值评估的ME${text{Petersen}$和ME${text{Capture}$,以及取回一个单一价值评估的ME${text{Schnabel}$,这些衡量标准具有双重价值,可以分别评估质量和多样性方面的评估。在合成实验中,我们的方法组合对质量和多样性的下降十分敏感。此外,我们的方法与人类评估的关联性高于现有的几项挑战性任务,即无条件语言生成、机器翻译和文本合成。