In this paper, we study the effects of incorporating timestamps, such as document creation dates, into generation systems. Two types of time-aware prompts are investigated: (1) textual prompts that encode document timestamps in natural language sentences; and (2) linear prompts that convert timestamps into continuous vectors. To explore extrapolation to future data points, we further introduce a new data-to-text generation dataset, TempWikiBio, containing more than 4 millions of chronologically ordered revisions of biographical articles from English Wikipedia, each paired with structured personal profiles. Through data-to-text generation on TempWikiBio, text-to-text generation on the content transfer dataset, and summarization on XSum, we show that linear prompts on encoder and textual prompts improve the generation quality on all datasets. Despite having less performance drop when testing on data drawn from a later time, linear prompts focus more on non-temporal information and are less sensitive to the given timestamps, according to human evaluations and sensitivity analyses. Meanwhile, textual prompts establish the association between the given timestamps and the output dates, yielding more factual temporal information in the output.
翻译:在本文中,我们研究了将时间戳(如文件创建日期)纳入生成系统的影响。调查了两种有时间意识的提示:(1) 将自然语言句中的文档时间标记编码成文档时间标记的文字提示;(2) 将时间标记转换成连续矢量的线性提示。为了探索未来数据点的外推法,我们进一步引入一个新的数据到文字生成数据集TempWikiBio(TempWikiBio),它包含400多万按时间顺序排列的英文维基百科的传记文章订正,每本都配有结构化的个人简介。通过TemWikiBio的数据到文字生成,内容传输数据集的文本到文本生成,以及XSum的汇总,我们显示编码和文本提示线性提示线性提示提高了所有数据集的生成质量。尽管测试从以后提取的数据时性能下降较少,但线性提示更多地侧重于非时序信息,对给定的时间标记不那么敏感,根据人的评估和敏感性分析。同时,文本提示性提示在给定的时间标记和时间标记中,更确定实际输出。同时,在给定的输出日期和产出日期之间,我们定出。