Having a rich multimodal inner language is an important component of human intelligence that enables several necessary core cognitive functions such as multimodal prediction, translation, and generation. Building upon the Conscious Turing Machine (CTM), a machine model for consciousness proposed by Blum and Blum (2021), we describe the desiderata of a multimodal language called Brainish, comprising words, images, audio, and sensations combined in representations that the CTM's processors use to communicate with each other. We define the syntax and semantics of Brainish before operationalizing this language through the lens of multimodal artificial intelligence, a vibrant research area studying the computational tools necessary for processing and relating information from heterogeneous signals. Our general framework for learning Brainish involves designing (1) unimodal encoders to segment and represent unimodal data, (2) a coordinated representation space that relates and composes unimodal features to derive holistic meaning across multimodal inputs, and (3) decoders to map multimodal representations into predictions (for fusion) or raw data (for translation or generation). Through discussing how Brainish is crucial for communication and coordination in order to achieve consciousness in the CTM, and by implementing a simple version of Brainish and evaluating its capability of demonstrating intelligence on multimodal prediction and retrieval tasks on several real-world image, text, and audio datasets, we argue that such an inner language will be important for advances in machine models of intelligence and consciousness.
翻译:拥有丰富的多式联运内部语言是人类智能的重要组成部分,它使多种语言能够发挥若干必要的核心认知功能,如多式联运预测、翻译和生成等。我们以Blum和Blum(2021年)提出的“自觉图灵机(CTM)”这一感知机器模型为基础,描述了一种称为“脑”的多式联运语言的贬义,它由词、图像、音频和感知组成,代表了CTM的处理者相互交流所使用的整体含义。我们通过多式联运人工智能的透镜界定了大脑的语法和语义。这是一个充满活力的研究领域,研究处理和与来自不同信号的信息所需的计算工具。我们的“脑学”总体框架包括设计:(1) 单式编码器,用于分段和代表单式数据,(2) 协调的表达空间,它与多种多式联运输入者使用的整体含义有关,(3) 解码器将多式联运表达成预测(聚合)或原始数据(翻译或生成)。我们通过讨论“自觉智能”如何对通信和协调至关重要,以便在CTM中实现意识,而我们“自觉”的计算”的计算,通过执行一个简单版本的智能和智能,从而显示一种简单版本的智能的智能分析能力。