A formal autism diagnosis is an inefficient and lengthy process. Families often have to wait years before receiving a diagnosis for their child; some may not receive one at all due to this delay. One approach to this problem is to use digital technologies to detect the presence of behaviors related to autism, which in aggregate may lead to remote and automated diagnostics. One of the strongest indicators of autism is stimming, which is a set of repetitive, self-stimulatory behaviors such as hand flapping, headbanging, and spinning. Using computer vision to detect hand flapping is especially difficult due to the sparsity of public training data in this space and excessive shakiness and motion in such data. Our work demonstrates a novel method that overcomes these issues: we use hand landmark detection over time as a feature representation which is then fed into a Long Short-Term Memory (LSTM) model. We achieve a validation accuracy and F1 Score of about 72% on detecting whether videos from the Self-Stimulatory Behaviour Dataset (SSBD) contain hand flapping or not. Our best model also predicts accurately on external videos we recorded of ourselves outside of the dataset it was trained on. This model uses less than 26,000 parameters, providing promise for fast deployment into ubiquitous and wearable digital settings for a remote autism diagnosis.
翻译:正式的自闭症诊断是一个效率低且漫长的过程。 家庭往往要等几年后才才能得到对孩子的诊断; 一些人可能由于这种延误而得不到诊断; 这一问题的一个办法是使用数字技术来检测自闭症行为的存在, 总体而言, 这可能会导致远程和自动诊断。 自闭症的最强指标之一是粘贴, 这是一种重复、 自闭症的行为, 比如手拍、 头撞和旋转。 使用计算机的视觉来检测自拍尤其困难, 因为这个空间的公共培训数据过于拥挤, 以及这些数据中过度的摇晃和运动。 我们的工作展示了一种克服这些问题的新方法: 我们用具有里程碑意义的检测作为特征表示, 然后将其输入一个长期短期内存(LSTM)模型。 在检测自闭行为数据集(SSBDD)的视频是否包含手拍或非手拍。 我们的最佳模型还准确预测了外部的远程视频, 用于外部快速部署数据设置。 我们所训练的这一模型比远程模型要少地用于提供数据设置。