了解和改进数据元板的使用,以简化语音助理数据隐私控制(例外版本) (Understanding and Improving Usability of Data Dashboards for Simplified Privacy Control of Voice Assistant Data (Extended Version))

Today, intelligent voice assistant (VA) software like Amazon's Alexa, Google's Voice Assistant (GVA) and Apple's Siri have millions of users. These VAs often collect and analyze huge user data for improving their functionality. However, this collected data may contain sensitive information (e.g., personal voice recordings) that users might not feel comfortable sharing with others and might cause significant privacy concerns. To counter such concerns, service providers like Google present their users with a personal data dashboard (called `My Activity Dashboard'), allowing them to manage all voice assistant collected data. However, a real-world GVA-data driven understanding of user perceptions and preferences regarding this data (and data dashboards) remained relatively unexplored in prior research. To that end, in this work we focused on Google Voice Assistant (GVA) users and investigated the perceptions and preferences of GVA users regarding data and dashboard while grounding them in real GVA-collected user data. Specifically, we conducted an 80-participant survey-based user study to collect both generic perceptions regarding GVA usage as well as desired privacy preferences for a stratified sample of their GVA data. We show that most participants had superficial knowledge about the type of data collected by GVA. Worryingly, we found that participants felt uncomfortable sharing a non-trivial 17.7% of GVA-collected data elements with Google. The current My Activity dashboard, although useful, did not help long-time GVA users effectively manage their data privacy. Our real-data-driven study found that showing users even one sensitive data element can significantly improve the usability of data dashboards. To that end, we built a classifier that can detect sensitive data for data dashboard recommendations with a 95% F1-score and shows 76% improvement over baseline models.

翻译：今天,智能语音助理(VA)软件,如亚马逊的Alexa、谷歌的语音助理(GVA)和苹果的Siri等智能语音助理(VA)软件拥有数百万用户。然而,这些VA通常收集和分析巨大的用户数据,以改善其功能。然而,所收集的数据可能包含敏感信息(如个人语音录音),用户可能感到不舒服与他人分享,并可能引起重大隐私关切。为了消除这些关切,谷歌等服务提供商向用户展示了个人数据仪表盘(称为“Myact Dashboard ”),允许他们管理所有语音助理收集的数据。然而,真实的GVA数据(GVA)数据数据(和数据仪表)数据(GVA数据)数据(GVA数据)用户(GVA数据)数据样本(GVA数据)数据(GVA数据)数据(GVA数据)数据(数据)样本(GVA数据)数据(OVA数据)数据(数据)样本(GVA数据)数据(我们通过GVA数据)数据参与者(GVA数据)数据(GVA数据)数据)数据)数据(大量地展示了数据(ODVA数据)数据(数据)的样本样本样本样本样本数据)的样本样本样本样本样本数据(我们发现数据(O数据)参与者(ODDDDDVA数据)数据)数据)数据使用者(O数据)数据使用者(O数据)数据)数据(我們(DDDDDD(ODDDDDDDD)的模型(OI数据)数据(ODDDDD(OD)的基)的內。