Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 2021 . Despite these growing perils, there remains a notable paucity of data science research to scientifically inform equitable public policy decisions for improving the livelihood of at-risk populations. Scattered data science efforts exist to address these challenges, but they remain isolated from practice and prone to algorithmic harms concerning lack of privacy, fairness, interpretability, accountability, transparency, and ethics. Biases in data-driven methods carry the risk of amplifying inequalities in high-stakes policy decisions that impact the livelihood of millions of people. Consequently, proclaimed benefits of data-driven innovations remain inaccessible to policymakers, practitioners, and marginalized communities at the core of humanitarian actions and global development. To help fill this gap, we propose the Data-driven Humanitarian Mapping Research Program, which focuses on developing novel data science methodologies that harness human-machine intelligence for high-stakes public policy and resilience planning. The proceedings of the 1st Data-driven Humanitarian Mapping workshop at the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, August 24th, 2020.
There is an intimate connection between numerical upscaling of multiscale PDEs and scattered data approximation of heterogeneous functions: the coarse variables selected for deriving an upscaled equation (in the former) correspond to the sampled information used for approximation (in the latter). As such, both problems can be thought of as recovering a target function based on some coarse data that are either artificially chosen by an upscaling algorithm, or determined by some physical measurement process. The purpose of this paper is then to study that, under such a setup and for a specific elliptic problem, how the lengthscale of the coarse data, which we refer to as the subsampled lengthscale, influences the accuracy of recovery, given limited computational budgets. Our analysis and experiments identify that, reducing the subsampling lengthscale may improve the accuracy, implying a guiding criterion for coarse-graining or data acquisition in this computationally constrained scenario, especially leading to direct insights for the implementation of the Gamblets method in the numerical homogenization literature. Moreover, reducing the lengthscale to zero may lead to a blow-up of approximation error if the target function does not have enough regularity, suggesting the need for a stronger prior assumption on the target function to be approximated. We introduce a singular weight function to deal with it, both theoretically and numerically. This work sheds light on the interplay of the lengthscale of coarse data, the computational costs, the regularity of the target function, and the accuracy of approximations and numerical simulations.
Data centers are carbon-intensive enterprises due to their massive energy consumption, and it is estimated that data center industry will account for 8\% of global carbon emissions by 2030. However, both technological and policy instruments for reducing or even neutralizing data center carbon emissions have not been thoroughly investigated. To bridge this gap, this survey paper proposes a roadmap towards carbon-neutral data centers that takes into account both policy instruments and technological methodologies. We begin by presenting the carbon footprint of data centers, as well as some insights into the major sources of carbon emissions. Following that, carbon neutrality plans for major global cloud providers are discussed to summarize current industrial efforts in this direction. In what follows, we introduce the carbon market as a policy instrument to explain how to offset data center carbon emissions in a cost-efficient manner. On the technological front, we propose achieving carbon-neutral data centers by increasing renewable energy penetration, improving energy efficiency, and boosting energy circulation simultaneously. A comprehensive review of existing technologies on these three topics is elaborated subsequently. Based on this, a multi-pronged approach towards carbon neutrality is envisioned and a digital twin-powered industrial artificial intelligence (AI) framework is proposed to make this solution a reality. Furthermore, three key scientific challenges for putting such a framework in place are discussed. Finally, several applications for this framework are presented to demonstrate its enormous potential.
This paper determines whether the two core data protection principles of data minimisation and purpose limitation can be meaningfully implemented in data-driven systems. While contemporary data processing practices appear to stand at odds with these principles, we demonstrate that systems could technically use much less data than they currently do. This observation is a starting point for our detailed techno-legal analysis uncovering obstacles that stand in the way of meaningful implementation and compliance as well as exemplifying unexpected trade-offs which emerge where data protection law is applied in practice. Our analysis seeks to inform debates about the impact of data protection on the development of artificial intelligence in the European Union, offering practical action points for data controllers, regulators, and researchers.
Much recent research has uncovered and discussed serious concerns of bias in facial analysis technologies, finding performance disparities between groups of people based on perceived gender, skin type, lighting condition, etc. These audits are immensely important and successful at measuring algorithmic bias but have two major challenges: the audits (1) use facial recognition datasets which lack quality metadata, like LFW and CelebA, and (2) do not compare their observed algorithmic bias to the biases of their human alternatives. In this paper, we release improvements to the LFW and CelebA datasets which will enable future researchers to obtain measurements of algorithmic bias that are not tainted by major flaws in the dataset (e.g. identical images appearing in both the gallery and test set). We also use these new data to develop a series of challenging facial identification and verification questions that we administered to various algorithms and a large, balanced sample of human reviewers. We find that both computer models and human survey participants perform significantly better at the verification task, generally obtain lower accuracy rates on dark-skinned or female subjects for both tasks, and obtain higher accuracy rates when their demographics match that of the question. Computer models are observed to achieve a higher level of accuracy than the survey participants on both tasks and exhibit bias to similar degrees as the human survey participants.
Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how they ingest, digest, and distill data into relevant information. We argue that machine learning, and especially deep learning approaches, can meet this analytic challenge to enhance our understanding, monitoring capacity, and conservation of wildlife species. Incorporating machine learning into ecological workflows could improve inputs for population and behavior models and eventually lead to integrated hybrid modeling tools, with ecological models acting as constraints for machine learning models and the latter providing data-supported insights. In essence, by combining new machine learning approaches with ecological domain knowledge, animal ecologists can capitalize on the abundance of data generated by modern sensor technologies in order to reliably estimate population abundances, study animal behavior and mitigate human/wildlife conflicts. To succeed, this approach will require close collaboration and cross-disciplinary education between the computer science and animal ecology communities in order to ensure the quality of machine learning approaches and train a new generation of data scientists in ecology and conservation.
Currently, the development of computer games has shown a tremendous surge. The ease and speed of internet access today have also influenced the development of computer games, especially computer games that are played online. Internet technology has allowed computer games to be played in multiplayer mode. Interaction between players in a computer game can be built in several ways, one of which is by providing balanced opponents. Opponents can be developed using intelligent agents. On the other hand, research on developing intelligent agents is also growing rapidly. In computer game development, one of the easiest ways to measure the performance of an intelligent agent is to develop a virtual environment that allows the intelligent agent to interact with other players. In this research, we try to develop an intelligent agent and virtual environment for the board game. To be easily accessible, the intelligent agent and virtual environment are then developed into an Application Programming Interface (API) service called Gapoera API. The Gapoera API service that is built is expected to help game developers develop a game without having to think much about the artificial intelligence that will be embedded in the game. This service provides a basic multilevel intelligent agent that can provide users with playing board games commonly played in Indonesia. Although the Gapoera API can be used for various types of games, in this paper, we will focus on the discussion on a popular traditional board game in Indonesia, namely Mancala. The test results conclude that the multilevel agent concept developed has worked as expected. On the other hand, the development of the Gapoera API service has also been successfully accessed on several game platforms.
Although machine learning (ML) is widely used in practice, little is known about practitioners' actual understanding of potential security challenges. In this work, we close this substantial gap in the literature and contribute a qualitative study focusing on developers' mental models of the ML pipeline and potentially vulnerable components. Studying mental models has helped in other security fields to discover root causes or improve risk communication. Our study reveals four characteristic ranges in mental models of industrial practitioners. The first range concerns the intertwined relationship of adversarial machine learning (AML) and classical security. The second range describes structural and functional components. The third range expresses individual variations of mental models, which are neither explained by the application nor by the educational background of the corresponding subjects. The fourth range corresponds to the varying levels of technical depth, which are however not determined by our subjects' level of knowledge. Our characteristic ranges have implications for the integration of AML into corporate workflows, security enhancing tools for practitioners, and creating appropriate regulatory frameworks for AML.
We present a new method for multi-agent planning involving human drivers and autonomous vehicles (AVs) in unsignaled intersections, roundabouts, and during merging. In multi-agent planning, the main challenge is to predict the actions of other agents, especially human drivers, as their intentions are hidden from other agents. Our algorithm uses game theory to develop a new auction, called GamePlan, that directly determines the optimal action for each agent based on their driving style (which is observable via commonly available sensors). GamePlan assigns a higher priority to more aggressive or impatient drivers and a lower priority to more conservative or patient drivers; we theoretically prove that such an approach is game-theoretically optimal prevents collisions and deadlocks. We compare our approach with prior state-of-the-art auction techniques including economic auctions, time-based auctions (first-in first-out), and random bidding and show that each of these methods result in collisions among agents when taking into account driver behavior. We additionally compare with methods based on deep reinforcement learning, deep learning, and game theory and present our benefits over these approaches. Finally, we show that our approach can be implemented in the real-world with human drivers.
Fast developing artificial intelligence (AI) technology has enabled various applied systems deployed in the real world, impacting people's everyday lives. However, many current AI systems were found vulnerable to imperceptible attacks, biased against underrepresented groups, lacking in user privacy protection, etc., which not only degrades user experience but erodes the society's trust in all AI systems. In this review, we strive to provide AI practitioners a comprehensive guide towards building trustworthy AI systems. We first introduce the theoretical framework of important aspects of AI trustworthiness, including robustness, generalization, explainability, transparency, reproducibility, fairness, privacy preservation, alignment with human values, and accountability. We then survey leading approaches in these aspects in the industry. To unify the current fragmented approaches towards trustworthy AI, we propose a systematic approach that considers the entire lifecycle of AI systems, ranging from data acquisition to model development, to development and deployment, finally to continuous monitoring and governance. In this framework, we offer concrete action items to practitioners and societal stakeholders (e.g., researchers and regulators) to improve AI trustworthiness. Finally, we identify key opportunities and challenges in the future development of trustworthy AI systems, where we identify the need for paradigm shift towards comprehensive trustworthy AI systems.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.