In this study we demonstrate the viability of deploying BERT-style models to AWS Lambda in a production environment. Since the freely available pre-trained models are too large to be deployed in this way, we utilize knowledge distillation and fine-tune the models on proprietary datasets for two real-world tasks: sentiment analysis and semantic textual similarity. As a result, we obtain models that are tuned for a specific domain and deployable in the serverless environment. The subsequent performance analysis shows that this solution does not only report latency levels acceptable for production use but that it is also a cost-effective alternative to small-to-medium size deployments of BERT models, all without any infrastructure overhead.
翻译:在这项研究中,我们展示了在生产环境中向AWS Lambda部署BERT型模型的可行性。由于免费的、预先培训的模型太大,无法以这种方式部署,我们使用知识蒸馏和微调两种现实世界任务(情绪分析和语义文字相似性)的专有数据集模型。结果,我们获得了适合特定域且可在无服务器环境中部署的模型。随后的绩效分析表明,这一解决方案不仅报告生产使用可接受的长期水平,而且报告它也是一种低成本高效益的替代方法,可以取代BERT型中小型部署,所有这些模式都没有任何基础设施间接费用。