Midrash collections are complex rabbinic works that consist of text in multiple languages, which evolved through long processes of unstable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter of dispute among scholars, yet it is essential for scholars' understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recently released pretrained Transformer models for Hebrew. Additionally, we demonstrate how our method can be applied to uncover lost material from Midrash Tanhuma.
翻译:Midrash收藏是复杂的拉比作品,由多种语言的文本组成,经过长期不稳定的口头和书面传送过程演变而成。在这种汇编中,确定某一段落的来源并不总是简单明了的,而且往往是学者之间的一个争议问题,然而,对于学者了解该段落及其与拉比文体中其他文本的关系至关重要。为了帮助解决这一问题,我们建议根据拉比文的风格,利用最近获得培训的希伯来语变异器模型,对拉比文进行分类。此外,我们展示了如何运用我们的方法来发现从Midrash Tanhuma丢失的材料。