We introduce a family of chronologically consistent, instruction-tuned large language models to eliminate lookahead bias. Each model is trained only on data available before a clearly defined knowledge-cutoff date, ensuring strict temporal separation from any post-cutoff data. The resulting framework offers (i) a simple, conversational chat interface, (ii) fully open, fixed model weights that guarantee replicability, and (iii) a conservative lower bound on forecast accuracy, isolating the share of predictability that survives once training leakage is removed. Together, these features provide researchers with an easy-to-use generative AI tool useful for a wide range of prediction tasks that is free of lookahead bias.
翻译:我们提出了一族时序一致且经过指令微调的大型语言模型,以消除前瞻性偏差。每个模型仅使用在明确定义的知识截止日期之前可获取的数据进行训练,确保与截止日期后数据的严格时间分离。该框架具备以下特点:(i)简洁的对话式聊天界面;(ii)完全开放、固定的模型权重,确保可复现性;(iii)预测准确性的保守下界,在排除训练数据泄露后,分离出仍存的可预测性部分。这些特性共同为研究人员提供了一个易于使用的生成式人工智能工具,适用于广泛的预测任务,且完全无前瞻性偏差。