The assessment and valuation of real estate requires large datasets with real estate information. Unfortunately, real estate databases are usually sparse in practice, i.e., not for each property every important attribute is available. In this paper, we study the potential of predicting high-level real estate attributes from visual data, specifically from two visual modalities, namely indoor (interior) and outdoor (facade) photos. We design three models using different multimodal fusion strategies and evaluate them for three different use cases. Thereby, a particular challenge is to handle missing modalities. We evaluate different fusion strategies, present baselines for the different prediction tasks, and find that enriching the training data with additional incomplete samples can lead to an improvement in prediction accuracy. Furthermore, the fusion of information from indoor and outdoor photos results in a performance boost of up to 5% in Macro F1-score.
翻译:房地产评估和估值需要大量具有房地产信息的数据集。 不幸的是,房地产数据库在实践中通常很少,不是每个重要属性都具备的每一个财产。在本文件中,我们研究从视觉数据,特别是从室内(内)和户外(外)照片两种视觉模式预测高水平房地产属性的可能性。我们使用不同的多式联运组合战略设计了三种模型,并评估了三种不同的使用案例。因此,一个特别的挑战是如何处理缺失的模式。我们评估不同的聚合战略,提出不同预测任务的基准,并发现用更多不完整的样本丰富培训数据可以提高预测准确性。此外,室内和室外照片信息集成的结果是,Mencro F1核心的功能提升高达5%。