Natural language processing (NLP) is the field that attempts to make human language accessible to computers, and it relies on applying a mathematical model to express the meaning of symbolic language. One such model, DisCoCat, defines how to express both the meaning of individual words as well as their compositional nature. This model can be naturally implemented on quantum computers, leading to the field quantum NLP (QNLP). Recent experimental work used quantum machine learning techniques to map from text to class label using the expectation value of the quantum encoded sentence. Theoretical work has been done on computing the similarity of sentences but relies on an unrealized quantum memory store. The main goal of this thesis is to leverage the DisCoCat model to design a quantum-based kernel function that can be used by a support vector machine (SVM) for NLP tasks. Two similarity measures were studied: (i) the transition amplitude approach and (ii) the SWAP test. A simple NLP meaning classification task from previous work was used to train the word embeddings and evaluate the performance of both models. The Python module lambeq and its related software stack was used for implementation. The explicit model from previous work was used to train word embeddings and achieved a testing accuracy of $93.09 \pm 0.01$%. It was shown that both the SVM variants achieved a higher testing accuracy of $95.72 \pm 0.01$% for approach (i) and $97.14 \pm 0.01$% for (ii). The SWAP test was then simulated under a noise model defined by the real quantum device, ibmq_guadalupe. The explicit model achieved an accuracy of $91.94 \pm 0.01$% while the SWAP test SVM achieved 96.7% on the testing dataset, suggesting that the kernelized classifiers are resilient to noise. These are encouraging results and motivate further investigations of our proposed kernelized QNLP paradigm.
翻译:自然语言处理( NLP) 是试图让计算机能够使用人类语言的字段。 它依赖于应用数学模型来表达象征性语言的含义。 其中一个模型 DisCoCat 定义如何表达单个单词的含义及其构成性质。 这个模型可以自然地在量子计算机上应用, 导致字段量 NLP( QNLP) 。 最近实验工作使用量子机器学习技术, 利用量子数字编码句的预期值从文本到类标签。 在计算相似的句号时, 并依靠一个未实现的量子存储库。 其中一个这样的模型是DiscoCat 模型, 用来设计一个基于量基模型的内核函数及其组成性质功能。 SypencialName Slidal- mill 正在用Sliver- mile- mile 测试 Slidal- mal 数据。 Sypenal- messal- messional 正在用Syal- mind IM 测试 Sladeal- 工作期间, Slab- lial-modeal 正在用Sal- IM IM 测试 Sladeal- ta IM 数据。 Slad- ta 。 Slad- 正在用Silment Slad- ta 测试 Slad 和 Slad- ta 工作上一个实现一个在Sl IM- ta IM- ta IM- ta IM- ta 。 Slad- ta 数据测试 Sil 。 Sil 。 Sil- ta 。 Slad- ta 正在用 Slad- ta 。