In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix factorization techniques for representation learning, on text and image data simultaneously, thereby employing the widely used techniques of Natural Language Processing (NLP) and Computer Vision. The learnt representations are evaluated using downstream classification and regression tasks. The methodology adopted can be extended beyond the scope of this project as it uses Auto-Encoders for unsupervised representation learning.
翻译:近年来,由于研究人员正在将文字、图像、演讲等不同类型的数据纳入模型,以取得最佳结果,多式联运大赦国际呈现上升趋势,该项目利用多式联运大赦国际和矩阵化要素化技术,同时在文本和图像数据上进行代表性学习,从而利用广泛使用的自然语言处理和计算机愿景技术,通过下游分类和回归任务对所学到的表述进行评估,所采用的方法可以超出该项目的范围,因为它使用自动计算器进行不受监督的代表性学习。