Text style transfer involves rewriting the content of a source sentence in a target style. Despite there being a number of style tasks with available data, there has been limited systematic discussion of how text style datasets relate to each other. This understanding, however, is likely to have implications for selecting multiple data sources for model training. While it is prudent to consider inherent stylistic properties when determining these relationships, we also must consider how a style is realized in a particular dataset. In this paper, we conduct several empirical analyses of existing text style datasets. Based on our results, we propose a categorization of stylistic and dataset properties to consider when utilizing or comparing text style datasets.
翻译:文本样式传输涉及以目标样式重写源句的内容。 尽管在可用数据上有一些样式任务, 但对于文本样式数据集如何相互联系的问题, 系统性的讨论有限。 但是, 这种理解可能会对为模式培训选择多个数据源产生影响。 虽然在确定这些关系时考虑内在的文体属性是谨慎的, 但我们也必须考虑如何在特定数据集中实现样式。 在本文中, 我们对现有的文本样式数据集进行了若干实证分析。 根据我们的结果, 我们建议对文本样式数据集进行分类, 以便在使用或比较文本样式数据集时加以考虑 。