To leverage machine learning in any decision-making process, one must convert the given knowledge (for example, natural language, unstructured text) into representation vectors that can be understood and processed by machine learning model in their compatible language and data format. The frequently encountered difficulty is, however, the given knowledge is not rich or reliable enough in the first place. In such cases, one seeks to fuse side information from a separate domain to mitigate the gap between good representation learning and the scarce knowledge in the domain of interest. This approach is named Cross-Domain Knowledge Transfer. It is crucial to study the problem because of the commonality of scarce knowledge in many scenarios, from online healthcare platform analyses to financial market risk quantification, leaving an obstacle in front of us benefiting from automated decision making. From the machine learning perspective, the paradigm of semi-supervised learning takes advantage of large amount of data without ground truth and achieves impressive learning performance improvement. It is adopted in this dissertation for cross-domain knowledge transfer. (to be continued)
翻译:为了在任何决策过程中利用机器学习,必须将给定的知识(例如,自然语言,非结构化文本)转换为表示向量,以便机器学习模型理解和处理其兼容的语言和数据格式。然而,经常遇到的困难是,给定的知识本身不够丰富或可靠。在这种情况下,人们寻求融合来自另一个领域的侧面信息,以减轻好的表示学习与感兴趣领域中稀缺知识之间的差距。这种方法被称为跨领域知识转移。研究这个问题是十分重要的,因为稀缺知识在许多场景中很常见,从在线医疗平台分析到金融市场风险量化,留下了一个障碍在我们从自动化决策中获益。从机器学习的角度来看,半监督学习的范例利用没有地面真相的大量数据,并实现了令人印象深刻的学习性能提升。本论文采用半监督学习用于跨领域知识转移。(未完待续)