In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.
翻译:在本文中,我们首次提出并展示了仓库架子和架子的单面多视图布局估计。与典型的布局估计方法不同,MVRackLay估计了多层布局,每个层与架子的布局相对应。根据仓库场景的图像序列,一个双头的革命-LSTM建筑结构结构将各架子的架子、前部和顶部布局分割成片。MVRackLay在最小努力下,将这一产出转化为一个3D式的架子、架子和架子上所有物品的布局,准确描述整个仓库的布局,从架子、架子和每个架子内各层的布局、每层的每层的布局。MVRackLay将一系列不同的仓库场景概括化,每个架子、架子、头部和头部的架子的布局。MVRackLay展示了整个仓库的3D的3D版面的3D版面的3D版面的准确性表现,并展示了我们三楼层的多层的图像。