One of the latest applications of Artificial Intelligence (AI) is to generate images from natural language descriptions. These generators are now becoming available and achieve impressive results that have been used for example in the front cover of magazines. As the input to the generators is in the form of a natural language text, a question that arises immediately is how these models behave when the input is written in different languages. In this paper we perform an initial exploration of how the performance of three popular text-to-image generators depends on the language. The results show that there is a significant performance degradation when using languages other than English, especially for languages that are not widely used. This observation leads us to discuss different alternatives on how text-to-image generators can be improved so that performance is consistent across different languages. This is fundamental to ensure that this new technology can be used by non-native English speakers and to preserve linguistic diversity.
翻译:人工智能(AI)的最新应用之一是从自然语言描述中生成图像。 这些生成器现在可以使用,并取得了令人印象深刻的成果,例如,在杂志的封面上已经使用。由于对生成器的投入是以自然语言文本的形式提供的,因此立即产生的一个问题是,在以不同语言编写输入时,这些模型如何运作。在本文中,我们初步探讨了三种流行文本到图像生成器的性能如何取决于语言。结果显示,使用英语以外的语言,特别是没有广泛使用的语言,其性能会严重退化。这一观察促使我们讨论如何改进文本到图像生成器的替代方法,以便不同语言的性能保持一致。这对于确保非母语英语者能够使用这种新技术并保护语言多样性至关重要。