We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments. We argue that context information, particularly the temporal information about other actions in the video sequence, is valuable for action segmentation. The proposed parsing algorithm temporally segments the video sequence into action segments. The optimal temporal segmentation is found using a dynamic programming search algorithm that optimizes the overall classification confidence score. The classification score of each segment is determined using local features calculated from that segment as well as context features calculated from other candidate action segments of the sequence. Experimental results on the Breakfast activity data-set showed improved segmentation accuracy compared to existing state-of-the-art parsing techniques.
翻译:我们提出一个行动解析算法,将含有数量未知的行动的视频序列分析成其行动部分。我们争辩说,背景信息,特别是视频序列中其他行动的时间信息,对于行动分解很有价值。拟议的解析算法将视频序列的时段分解成行动部分。最佳时间分解法采用动态编程搜索算法,优化总体分类信任分数。每个部分的分类分数是利用从该部分计算的本地特征以及从该序列中其他候选行动部分计算的上下文特征确定的。早餐活动数据集的实验结果显示,与现有最先进的分解技术相比,分解准确性有所提高。