Third-party software, or skills, are essential components in Smart Personal Assistants (SPA). The number of skills has grown rapidly, dominated by a changing environment that has no clear business model. Skills can access personal information and this may pose a risk to users. However, there is little information about how this ecosystem works, let alone the tools that can facilitate its study. In this paper, we present the largest systematic measurement of the Amazon Alexa skill ecosystem to date. We study developers' practices in this ecosystem, including how they collect and justify the need for sensitive information, by designing a methodology to identify over-privileged skills with broken privacy policies. We collect 199,295 Alexa skills and uncover that around 43% of the skills (and 50% of the developers) that request these permissions follow bad privacy practices, including (partially) broken data permissions traceability. In order to perform this kind of analysis at scale, we present SkillVet that leverages machine learning and natural language processing techniques, and generates high-accuracy prediction sets. We report a number of concerning practices including how developers can bypass Alexa's permission system through account linking and conversational skills, and offer recommendations on how to improve transparency, privacy and security. Resulting from the responsible disclosure we have conducted, 13% of the reported issues no longer pose a threat at submission time.
翻译:第三方软件或技能是智能个人助理(SPA)的基本组成部分。技能的数量迅速增长,以缺乏明确的商业模式的不断变化的环境为主导。技能可以获取个人信息,这可能会给用户带来风险。然而,关于这一生态系统如何运作的信息很少,更不用说能够促进其研究的工具。在本文中,我们介绍了迄今为止对亚马逊亚历山大技能生态系统进行的最大系统测量;我们研究了开发者在这一生态系统中的做法,包括他们如何收集和证明需要敏感信息,方法是设计一种方法,查明隐私政策崩溃后过度拥有的技能。我们收集了199 295个亚历山大技能,并发现要求获得这些许可的大约43%的技能(以及50%的开发者)遵循了不良的隐私做法,包括(部分)中断的数据许可的可追溯性。为了进行这种规模分析,我们介绍了利用机器学习和自然语言处理技术的SkillVet,并生成了高度精确的预测数据集。我们报告了一些做法,包括开发者如何绕过Alexa的许可系统,通过连接和对话隐私,没有长期报告我们如何改进了13项保密性披露的透明度,我们如何改进了透明度。