Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features' relevance that help the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have actually been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, the ones only available before deployment. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered = 0.792, developer-centered = 0.762.
翻译:移动应用程序商店是移动应用程序的主要分销商。 它们定期对部署的应用程序应用审查程序。 但是, 其中一些审查程序可能不够充分, 或应用得较晚。 延迟删除应用程序可能会给开发者和用户都带来不愉快的后果。 因此, 在这项工作中, 我们提议了一种数据驱动预测方法, 确定相关应用程序是否会被删除或被接受。 它还表明有助于利益攸关方解释的特性的相关性。 反过来, 我们的方法可以支持开发商改进其应用程序和用户下载那些更不可能被删除的应用程序。 我们关注Google App 商店, 并汇编了一套870, 515个应用程序的新数据集, 其中56%的应用程序实际上已被从市场上删除。 我们提议的方法是将多个 XGBoost 机器学习分类器集合起来。 我们提出了两种模型: 用户以用户为中心, 使用47 功能, 开发者以37 特性为中心, 仅在部署前才能使用这些特性。 我们在测试集的 ROC Curves (AUCs) 下实现以下区域: 用户-centrence= 0.792, 开发者= 0.762。