This paper investigates how Large Language Models (LLMs) from leading providers (OpenAI, Google, Anthropic, DeepSeek, and xAI) can be applied to quantitative sector-based portfolio construction. We use LLMs to identify investable universes of stocks within S&P 500 sector indices and evaluate how their selections perform when combined with classical portfolio optimization methods. Each model was prompted to select and weight 20 stocks per sector, and the resulting portfolios were compared with their respective sector indices across two distinct out-of-sample periods: a stable market phase (January-March 2025) and a volatile phase (April-June 2025). Our results reveal a strong temporal dependence in LLM portfolio performance. During stable market conditions, LLM-weighted portfolios frequently outperformed sector indices on both cumulative return and risk-adjusted (Sharpe ratio) measures. However, during the volatile period, many LLM portfolios underperformed, suggesting that current models may struggle to adapt to regime shifts or high-volatility environments underrepresented in their training data. Importantly, when LLM-based stock selection is combined with traditional optimization techniques, portfolio outcomes improve in both performance and consistency. This study contributes one of the first multi-model, cross-provider evaluations of generative AI algorithms in investment management. It highlights that while LLMs can effectively complement quantitative finance by enhancing stock selection and interpretability, their reliability remains market-dependent. The findings underscore the potential of hybrid AI-quantitative frameworks, integrating LLM reasoning with established optimization techniques, to produce more robust and adaptive investment strategies.
翻译:本文研究了如何将领先供应商(OpenAI、Google、Anthropic、DeepSeek 和 xAI)的大型语言模型应用于基于行业的量化投资组合构建。我们使用LLM识别标普500行业指数内的可投资股票池,并评估其选股结果与经典投资组合优化方法结合时的表现。每个模型被提示为每个行业选择和加权20只股票,并将生成的投资组合与其对应的行业指数在两个不同的样本外时期进行比较:稳定市场阶段(2025年1月至3月)和波动阶段(2025年4月至6月)。我们的结果揭示了LLM投资组合表现具有强烈的时间依赖性。在稳定的市场条件下,LLM加权的投资组合在累计收益和风险调整后(夏普比率)指标上经常优于行业指数。然而,在波动时期,许多LLM投资组合表现不佳,这表明当前模型可能难以适应其训练数据中代表性不足的制度转换或高波动性环境。重要的是,当基于LLM的选股与传统优化技术结合时,投资组合结果在表现和一致性上均有所改善。本研究为投资管理中生成式人工智能算法提供了首批多模型、跨供应商的评估之一。它强调,虽然LLM可以通过增强选股和可解释性来有效补充量化金融,但其可靠性仍依赖于市场环境。研究结果强调了混合AI-量化框架的潜力,即整合LLM推理与成熟的优化技术,以产生更稳健和自适应的投资策略。