Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of "intent" conditioned on the robot's perceived state. However, these approaches typically assume task-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latent representations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction dataset collected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.
翻译:能够推断人类意图的机器人装备设备是有效合作的重要先决条件。 实现此目标的大多数计算方法都是以机器人的感知状态为条件的“ 意图” 概率分布。 但是,这些方法通常以人类意图的具体任务标志为先验的。 为了克服这一限制,我们建议采用分解序列组合组合变动自动编码器(DiskVAE),这是一个集群框架,能够以不受监督的方式了解这种意图的分布。 拟议的框架利用未经监督的学习的最新进展来分解序列数据的潜在显示,将时间变化的局部特征与时间变化的全球属性区分开来。 作为一种新的扩展, DiSCVAE还推断出一种离散变量,形成一种潜在的混合物模型,从而能够将这些全球序列概念(例如高层次的意图)进行组合。 我们用机器人轮椅收集的关于真实世界人类- 机器人相互作用数据集的DiskVAE。 我们的研究结果显示,推断的离散变量与人类意图相吻合,具有共同控制的承诺。