Given a monocular colour image of a warehouse rack, we aim to predict the bird's-eye view layout for each shelf in the rack, which we term as multi-layer layout prediction. To this end, we present RackLay, a deep neural network for real-time shelf layout estimation from a single image. Unlike previous layout estimation methods, which provide a single layout for the dominant ground plane alone, RackLay estimates the top-view and front-view layout for each shelf in the considered rack populated with objects. RackLay's architecture and its variants are versatile and estimate accurate layouts for diverse scenes characterized by varying number of visible shelves in an image, large range in shelf occupancy factor and varied background clutter. Given the extreme paucity of datasets in this space and the difficulty involved in acquiring real data from warehouses, we additionally release a flexible synthetic dataset generation pipeline WareSynth which allows users to control the generation process and tailor the dataset according to contingent application. The ablations across architectural variants and comparison with strong prior baselines vindicate the efficacy of RackLay as an apt architecture for the novel problem of multi-layered layout estimation. We also show that fusing the top-view and front-view enables 3D reasoning applications such as metric free space estimation for the considered rack.
翻译:根据仓库架子的单色图像,我们的目标是预测机架中每个架子的鸟眼外观布局,我们称之为多层布局预测。为此,我们提出RackLay,这是一个用于实时架子布局估计的由单一图像组成的深神经网络。与以往的布局估计方法不同,前者仅为占支配地位的地面平面提供一个单一的布局,RackLay估计了被认为装有物体的架子的每个架子的上视和前视布局。RackLay的建筑结构及其变体是多功能的,并估计了不同场景的准确布局,其特点是图像中可见的架子数量不同,架子占用系数大,背景也不同。鉴于这一空间中数据集极为稀少,而且难以从仓库获取真实数据,我们又进一步发布一个灵活的合成数据集生成管道WareSynth,使用户能够控制生成过程,并根据应急应用对数据集进行调整。建筑变形结构的宽幅和对比,与先前强有力的基线相比,将RackLay的布局的功效定为一个新的空间图层图层图层,也使得我们可以自由估算,从而进行新的空间的图式前推。