The demand for a huge amount of data for machine learning (ML) applications is currently a bottleneck in an empirically dominated field. We propose a method to combine prior knowledge with data-driven methods to significantly reduce their data dependency. In this study, component-based machine learning (CBML) as the knowledge-encoded data-driven method is examined in the context of energy-efficient building engineering. It encodes the abstraction of building structural knowledge as semantic information in the model organization. We design a case experiment to understand the efficacy of knowledge-encoded ML in sparse data input (1% - 0.0125% sampling rate). The result reveals its three advanced features compared with pure ML methods: 1. Significant improvement in the robustness of ML to extremely small-size and inconsistent datasets; 2. Efficient data utilization from different entities' record collections; 3. Characteristics of accepting incomplete data with high interpretability and reduced training time. All these features provide a promising path to alleviating the deployment bottleneck of data-intensive methods and contribute to efficient real-world data usage. Moreover, four necessary prerequisites are summarized in this study that ensures the target scenario benefits by combining prior knowledge and ML generalization.
翻译:目前,对机器学习(ML)应用的大量数据的需求是一个经验主导的领域的一个瓶颈。我们提出一种方法,将先前的知识与数据驱动的方法结合起来,以大大减少数据依赖性。在本研究中,以知识编码的数据驱动方法为基础,以组件为基础的机器学习(CBML)作为知识编码的数据驱动方法,在节能建筑工程中加以研究。它将建筑结构知识的抽象化编码为模型组织中的语义信息。我们设计了一个案例实验,以了解知识编码ML在稀薄数据输入中的功效(1% - 0.0125 % 抽样率) 。结果显示它与纯粹的ML方法相比具有三种先进的特征:1. 将ML的强健性与极小和不一致的数据集显著改善;2. 从不同实体的记录收集中有效地利用数据;3. 接受不完全的数据的特征,高可解释性和减少培训时间。所有这些特征为减轻数据密集方法的部署瓶颈和有助于高效率使用真实世界数据提供了一条有希望的道路。此外,本研究总结了四个必要的先决条件,以确保将先前的知识与ML综合起来,从而实现目标设想的好处。</s>