Recent advances in machine learning have enabled its wide application in different domains, and one of the most exciting applications is autonomous vehicles (AVs), which have encouraged the development of a number of ML algorithms from perception to prediction to planning. However, training AVs usually requires a large amount of training data collected from different driving environments (e.g., cities) as well as different types of personal information (e.g., working hours and routes). Such collected large data, treated as the new oil for ML in the data-centric AI era, usually contains a large amount of privacy-sensitive information which is hard to remove or even audit. Although existing privacy protection approaches have achieved certain theoretical and empirical success, there is still a gap when applying them to real-world applications such as autonomous vehicles. For instance, when training AVs, not only can individually identifiable information reveal privacy-sensitive information, but also population-level information such as road construction within a city, and proprietary-level commercial secrets of AVs. Thus, it is critical to revisit the frontier of privacy risks and corresponding protection approaches in AVs to bridge this gap. Following this goal, in this work, we provide a new taxonomy for privacy risks and protection methods in AVs, and we categorize privacy in AVs into three levels: individual, population, and proprietary. We explicitly list out recent challenges to protect each of these levels of privacy, summarize existing solutions to these challenges, discuss the lessons and conclusions, and provide potential future directions and opportunities for both researchers and practitioners. We believe this work will help to shape the privacy research in AV and guide the privacy protection technology design.
翻译:最近在机器学习方面的进步使得它在不同领域得到广泛应用,最令人兴奋的应用之一是自主工具(AV),这鼓励了从感知到预测到规划等一系列ML算法的发展,然而,培训AV通常需要从不同的驱动环境(例如城市)收集大量培训数据,以及不同类型的个人信息(例如,工作时间和路线),这些收集的大量数据,在以数据为中心的AI时代被视为ML的新石油,通常包含大量隐私敏感信息,很难消除甚至难以审计。虽然现有的隐私保护方法在理论上和经验上取得了一定的成功,但在应用到现实世界的应用(例如自主工具)时,仍然存在着差距。例如,培训AV不仅可以单独识别的信息能够揭示隐私敏感信息,而且可以提供人口层面的信息,例如城市内的道路建设,以及AV的专利级商业秘密。 因此,我们必须重新审视隐私风险的前沿以及AV中的相应保护方法来弥补这一差距。 在这项目标之后,我们在研究过程中,我们为隐私和专利设计提供了一种潜在的机遇。