Real-time crowd-powered systems, such as Chorus/Evorus, VizWiz, and Apparition, have shown how incorporating humans into automated systems could supplement where the automatic solutions fall short. However, one unspoken bottleneck of applying such architectures to more scenarios is the longer latency of including humans in the loop of automated systems. For the applications that have hard constraints in turnaround times, human-operated components' longer latency and large speed variation seem to be apparent deal breakers. This paper explicates and quantifies these limitations by using a human-powered text-based backend to hold conversations with users through a voice-only smart speaker. Smart speakers must respond to users' requests within seconds, so the workers behind the scenes only have a few seconds to compose answers. We measured the end-to-end system latency and the conversation quality with eight pairs of participants, showing the challenges and superiority of such systems.
翻译:实时人群动力系统,如Chorus/Evorus、VizWiz和Apparition等实时人群动力系统显示,在自动解决方案不足的地方,如何将人纳入自动系统可以补充自动系统。然而,将这种结构应用于更多情景的一个未言明的瓶颈是将人纳入自动化系统循环的较长的潜伏性。对于在周转时间有困难限制的应用程序,人类操作组件的延缓性和高速变异似乎是明显的交易断裂器。本文通过使用以人为动力的文本后端通过只用声音的智能扬声器与用户保持对话来解释和量化这些局限性。智能演讲者必须在数秒内响应用户的要求,所以幕后工人只有几秒钟的时间来做出答复。我们用八对参与者测量了端对端系统延缓度和谈话质量,显示了这些系统的挑战和优势。