Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling by achieving impressive results on numerous downstream tasks. It has also been shown that they are able to implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using and improving them for Conversational Recommender Systems (CRS). We first study how much off-the-shelf pre-trained BERT "knows" about recommendation items such as books, movies and music. In order to analyze the knowledge stored in BERT's parameters, we use different probes that require different types of knowledge to solve, namely content-based and collaborative-based. Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres. In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings. We resort to BERT's Masked Language Modelling head to probe its knowledge about the genre of items, with cloze style prompts. In addition, we employ BERT's Next Sentence Prediction head and representations' similarity to compare relevant and non-relevant search and recommendation query-document inputs to explore whether BERT can, without any fine-tuning, rank relevant items first. Finally, we study how BERT performs in a conversational recommendation downstream task. Overall, our analyses and experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based knowledge than collaborative-based knowledge; and (iii) fails on conversational recommendation when faced with adversarial data.
翻译:通过在众多下游任务中取得令人印象深刻的成果,BERT等受过训练的高度成熟的变压器模型最近显示,在语言建模方面非常强大,这在很多下游任务中取得了令人印象深刻的成果。还表明,它们能够在培训前的参数中隐含地储存事实知识。了解LMS的训练前程序实际上学到什么是使用和改进这些模型的关键步骤,例如对对话建议系统(CRS)来说,我们首先研究BERT以前受过训练的“了解”有关书籍、电影和音乐等建议项目。为了分析储存在BERT参数中的知识,我们使用不同的探测器,需要不同种类的知识来解决,即基于内容和协作的参数。基于内容的知识是需要模型将项目的标题与其内容信息(如文字描述和风格)相匹配的关键步骤。相比之下,基于协作的知识要求模型与类似项目(如评级等社区互动)匹配。我们采用BERT的压级语言建模模型,我们先探索关于项目类型知识的知识,然后使用Cluze Style Studio Studyal 分析,我们使用B 和B Rendal real real 进行不相关的搜索 分析,我们使用B 进行不进行不相关的搜索 。