In stockbreeding of beef cattle, computer vision-based approaches have been widely employed to monitor cattle conditions (e.g. the physical, physiology, and health). To this end, the accurate and effective recognition of cattle action is a prerequisite. Generally, most existing models are confined to individual behavior that uses video-based methods to extract spatial-temporal features for recognizing the individual actions of each cattle. However, there is sociality among cattle and their interaction usually reflects important conditions, e.g. estrus, and also video-based method neglects the real-time capability of the model. Based on this, we tackle the challenging task of real-time recognizing interactions between cattle in a single frame in this paper. The pipeline of our method includes two main modules: Cattle Localization Network and Interaction Recognition Network. At every moment, cattle localization network outputs high-quality interaction proposals from every detected cattle and feeds them into the interaction recognition network with a triple-stream architecture. Such a triple-stream network allows us to fuse different features relevant to recognizing interactions. Specifically, the three kinds of features are a visual feature that extracts the appearance representation of interaction proposals, a geometric feature that reflects the spatial relationship between cattle, and a semantic feature that captures our prior knowledge of the relationship between the individual action and interaction of cattle. In addition, to solve the problem of insufficient quantity of labeled data, we pre-train the model based on self-supervised learning. Qualitative and quantitative evaluation evidences the performance of our framework as an effective method to recognize cattle interaction in real time.
翻译:在牛群的饲养方面,广泛采用了基于计算机的视觉方法来监测牛群的状况(例如,物理、生理和健康)。为此,准确和有效地承认牛群行动是一个先决条件。一般而言,大多数现有模型局限于个人行为,即使用视频方法提取空间-时间特征,以确认每头牛的个体行为。然而,牛群之间的社交性及其互动通常反映重要条件,例如电子结构,以及视频方法忽视了模型的实时能力。在此基础上,我们处理实时确认牛群之间在本文单一框架内互动的艰巨任务。我们方法的管道包括两个主要模块:牛群本地化网络和互动识别网络。任何时候,牛群本地化网络都会产生高质量的互动建议,并将它们纳入与三流结构的互动识别网络。这种三流网络使我们能够结合与识别方法相关的不同特征。具体而言,三种特征是视觉模型,它揭示了牛群之间实时认识互动的交互性互动性,这是我们牛群之间真实时间关系中的一种互动性模型,反映了我们之前的相对性关系,一个基于牛群体行动模型的模型,它反映了我们之前的地理结构的自我互动关系中的一种特征,反映了我们对牛群体定义的自我定义的理解。