Our goal is to improve reliability of Machine Learning (ML) systems deployed in the wild. ML models perform exceedingly well when test examples are similar to train examples. However, real-world applications are required to perform on any distribution of test examples. Current ML systems can fail silently on test examples with distribution shifts. In order to improve reliability of ML models due to covariate or domain shift, we propose algorithms that enable models to: (a) generalize to a larger family of test distributions, (b) evaluate accuracy under distribution shifts, (c) adapt to a target distribution. We study causes of impaired robustness to domain shifts and present algorithms for training domain robust models. A key source of model brittleness is due to domain overfitting, which our new training algorithms suppress and instead encourage domain-general hypotheses. While we improve robustness over standard training methods for certain problem settings, performance of ML systems can still vary drastically with domain shifts. It is crucial for developers and stakeholders to understand model vulnerabilities and operational ranges of input, which could be assessed on the fly during the deployment, albeit at a great cost. Instead, we advocate for proactively estimating accuracy surfaces over any combination of prespecified and interpretable domain shifts for performance forecasting. We present a label-efficient estimation to address estimation over a combinatorial space of domain shifts. Further, when a model's performance on a target domain is found to be poor, traditional approaches adapt the model using the target domain's resources. Standard adaptation methods assume access to sufficient labeled resources, which may be impractical for deployed models. We initiate a study of lightweight adaptation techniques with only unlabeled data resources with a focus on language applications.
翻译:我们的目标是提高在野外部署的机器学习(ML)系统的可靠性。 ML 模型在测试示例与培训实例相似时效果极好。 但是,在任何测试实例的分布上,需要真实世界应用来进行测试范例的分布。 当前 ML 系统在分布变化的测试示例上可以默默地失败。 由于变换或域变换,我们提出了使模型能够:(a) 推广到测试分布的大家庭,(b) 评估分布变化的准确性,(c) 适应目标分布。我们研究了域间移动的可靠性受损的原因,并提出了用于培训域域间稳健模型的算法。模型软性软性应用的关键来源是域间过宽,而我们新的培训算法抑制了这种功能,而不是鼓励域域内假设。虽然我们提高了某些问题环境标准培训方法的可靠性,但ML系统的性能仍可能随域间变化而大不相同。对于模型的适应性和语言投入的操作范围至关重要,在部署期间,尽管成本很高,但可以对域域域间应用的计算方法进行较轻的评估。 相反,我们主张对域域域域域域内精确地进行精确的估算。</s>