The ILSUM shared task focuses on text summarization for two major Indian languages- Hindi and Gujarati, along with English. In this task, we experiment with various pretrained sequence-to-sequence models to find out the best model for each of the languages. We present a detailed overview of the models and our approaches in this paper. We secure the first rank across all three sub-tasks (English, Hindi and Gujarati). This paper also extensively analyzes the impact of k-fold cross-validation while experimenting with limited data size, and we also perform various experiments with a combination of the original and a filtered version of the data to determine the efficacy of the pretrained models.
翻译:本文介绍了在印地语、古吉拉特语和英语三种主要印度语言的摘要文本中,我们实验了各种预训练序列到序列模型,以找出每种语言的最佳模型。我们在此任务中获得了三个子任务(英语、印地语和古吉拉特语)的第一名。本文还对有限数据大小的 k 折交叉验证的影响进行了广泛的分析,并且我们还对原始数据和经过过滤的数据的组合进行了各种实验,以确定预训练模型的有效性。