This paper addresses the problem of viewpoint estimation of an object in a given image. It presents five key insights that should be taken into consideration when designing a CNN that solves the problem. Based on these insights, the paper proposes a network in which (i) The architecture jointly solves detection, classification, and viewpoint estimation. (ii) New types of data are added and trained on. (iii) A novel loss function, which takes into account both the geometry of the problem and the new types of data, is propose. Our network improves the state-of-the-art results for this problem by 9.8%.
翻译:本文讨论了对特定图像中对象的视角估计问题,提出了在设计解决该问题的CNN时应考虑的五种关键见解。根据这些见解,本文件建议建立一个网络,其中(一) 结构共同解决探测、分类和观点估计问题。 (二) 增加新类型的数据并对其进行培训。 (三) 提出一种新的损失功能,既考虑到问题的几何结构,又考虑到新类型的数据。我们的网络将这一问题的最新结果提高9.8%。