In the rapidly evolving field of artificial intelligence (AI), mapping innovation patterns and understanding effective technology transfer from academic research to practical applications are essential for economic growth. This paper introduces DeepInnovationAI, the first comprehensive global dataset designed to bridge the gap between academic papers and industrial patents. However, existing data infrastructures face three major limitations: fragmentation, incomplete coverage, and insufficient evaluative capacity. Here, we present DeepInnovationAI, a comprehensive global dataset documenting AI innovation trajectories. The dataset comprises three structured files: DeepPatentAI.csv: Contains 2,356,204 patent records with 8 field-specific attributes. DeepDiveAI.csv: Encompasses 3,511,929 academic publications with 13 metadata fields. These two datasets employ large language models, multilingual text analysis and dual-layer BERT classifiers to accurately identify AI-related content and utilizing hypergraph analysis methods to create robust innovation metrics. In addition, DeepCosineAI.csv: By applying semantic vector proximity analysis, this file presents approximately one hundred million calculated paper-patent similarity pairs to enhance understanding of how theoretical advancements translate into commercial technologies. This enables researchers, policymakers, and industry leaders to anticipate trends and identify emerging areas for collaboration. With its extensive temporal and geographical scope, DeepInnovationAI supports detailed analysis of technological development patterns and international competition dynamics, providing a robust foundation for modeling AI innovation dynamics and technology transfer processes.
翻译:暂无翻译