Cross-lingual text representations have gained popularity lately and act as the backbone of many tasks such as unsupervised machine translation and cross-lingual information retrieval, to name a few. However, evaluation of such representations is difficult in the domains beyond standard benchmarks due to the necessity of obtaining domain-specific parallel language data across different pairs of languages. In this paper, we propose an automatic metric for evaluating the quality of cross-lingual textual representations using images as a proxy in a paired image-text evaluation dataset. Experimentally, Backretrieval is shown to highly correlate with ground truth metrics on annotated datasets, and our analysis shows statistically significant improvements over baselines. Our experiments conclude with a case study on a recipe dataset without parallel cross-lingual data. We illustrate how to judge cross-lingual embedding quality with Backretrieval, and validate the outcome with a small human study.
翻译:最近,跨语文文本表述方式受到欢迎,并成为许多任务的主干,例如无人监督的机器翻译和跨语文信息检索等等。然而,由于需要获得不同语文之间特定领域的平行语言数据,在标准基准范围以外的领域很难评价这种表述方式。在本文件中,我们提出一个自动衡量标准,用以评价跨语文文本表述方式的质量,在配对图像文本评价数据集中以图像作为替代。实验性地显示,回溯检索方式与附加说明的数据集中的地面事实指标高度相关,我们的分析显示基线在统计上有很大改进。我们的实验结束时,对没有平行的跨语文数据的食谱数据集进行了案例研究。我们举例说明如何判断跨语文版本质量与回溯检索数据库的连接,并用小型人类研究来验证结果。