研究论文题目：数据增强方法在巴西葡萄牙语文本分类中的表现研究论文摘要：提高机器学习性能并增加模型泛化是AI研究人员不断追求的目标。数据增强技术经常用于实现这个目标，而它的大部分评估是使用英语语料库进行的。在本工作中，我们利用不同的现有数据增强方法，分析它们应用于使用巴西葡萄牙语语料库的文本分类问题的表现。结果表明，我们的分析显示使用其中一些技术具有可能改进；但是，它也暗示了对语言偏见和非英语文本数据稀缺的进一步利用的必要性。 (Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification)

翻译：研究论文题目：数据增强方法在巴西葡萄牙语文本分类中的表现研究论文摘要：提高机器学习性能并增加模型泛化是AI研究人员不断追求的目标。数据增强技术经常用于实现这个目标，而它的大部分评估是使用英语语料库进行的。在本工作中，我们利用不同的现有数据增强方法，分析它们应用于使用巴西葡萄牙语语料库的文本分类问题的表现。结果表明，我们的分析显示使用其中一些技术具有可能改进；但是，它也暗示了对语言偏见和非英语文本数据稀缺的进一步利用的必要性。

Marcellus Amadeus,Paulo Branco

Improving machine learning performance while increasing model generalization has been a constantly pursued goal by AI researchers. Data augmentation techniques are often used towards achieving this target, and most of its evaluation is made using English corpora. In this work, we took advantage of different existing data augmentation methods to analyze their performances applied to text classification problems using Brazilian Portuguese corpora. As a result, our analysis shows some putative improvements in using some of these techniques; however, it also suggests further exploitation of language bias and non-English text data scarcity.

翻译：