Taxi-demand prediction is an important application of machine learning that enables taxi-providing facilities to optimize their operations and city planners to improve transportation infrastructure and services. However, the use of sensitive data in these systems raises concerns about privacy and security. In this paper, we propose the use of federated learning for taxi-demand prediction that allows multiple parties to train a machine learning model on their own data while keeping the data private and secure. This can enable organizations to build models on data they otherwise would not be able to access. Despite its potential benefits, federated learning for taxi-demand prediction poses several technical challenges, such as class imbalance, data scarcity among some parties, and the need to ensure model generalization to accommodate diverse facilities and geographic regions. To effectively address these challenges, we propose a system that utilizes region-independent encoding for geographic lat-long coordinates. By doing so, the proposed model is not limited to a specific region, enabling it to perform optimally in any area. Furthermore, we employ cost-sensitive learning and various regularization techniques to mitigate issues related to data scarcity and overfitting, respectively. Evaluation with real-world data collected from 16 taxi service providers in Japan over a period of six months showed the proposed system predicted demand level accurately within 1\% error compared to a single model trained with integrated data. The system also effectively defended against membership inference attacks on passenger data.
翻译:暂无翻译