Realtime virtual data of objects and human presence in a large area holds a valuable key in enabling many experiences and applications in various industries and with exponential rise in the technological development of artificial intelligence, computer vision has expanded the possibilities of tracking and classifying things through just video inputs, which is also surpassing the limitations of most popular and common hardware setups known traditionally to detect human pose and position, such as low field of view and limited tracking capacity. The benefits of using computer vision in application development is large as it augments traditional input sources (like video streams) and can be integrated in many environments and platforms. In the context of new media interactive arts, based on physical movements and expanding over large areas or gallaries, this research presents a novel way and a framework towards obtaining data and virtual representation of objects/people - such as three-dimensional positions, skeltons/pose and masks from a single rgb camera. Looking at the state of art through some recent developments and building on prior research in the field of computer vision, the paper also proposes an original method to obtain three dimensional position data from monocular images, the model does not rely on complex training of computer vision systems but combines prior computer vision research and adds a capacity to represent z depth, ieto represent a world position in 3 axis from a 2d input source.
翻译:大量领域天体和人的存在实时虚拟数据是促成不同行业许多经验和应用的宝贵关键,随着人工智能技术开发的飞速增长,计算机愿景扩大了通过光视频输入跟踪和分类事物的可能性,这也超过了传统上已知最受欢迎和常见的硬件配置的局限性,这些硬件配置传统上是用来探测人的外形和位置,例如视野低和跟踪能力有限。在应用开发中使用计算机愿景的好处很大,因为它扩大了传统输入源(如视频流),可以纳入许多环境和平台。在新的媒体互动艺术中,基于物理移动和扩大大面积或星座,这一研究为获得物体/人的数据和虚拟代表提供了一个新颖的途径和框架,例如三维位置、方位/方位和从单一的Rgb相机的口罩。通过最近的一些发展以及计算机愿景领域以前的研究,本文还提出了一种从单向图像获取三维位置数据的原始方法。在新媒体互动艺术中,该模型并不依赖计算机视觉系统的复杂培训,而是代表了从前二维深度的计算机定位,而是代表了从一个计算机源输入到Z的能力。