The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future.
翻译:新的美国航天局天体物理学数据系统(ADS)的设计是一个以服务为导向的架构(SOA ), 由多个定制的阿帕奇索尔搜索引擎(Apache Solr Solr) 搜索引擎(SOA), 以及一系列微型服务、 使用多装箱的集装箱, 并部署在亚马逊网络服务系统(AWS ) 组成。 对于像ADS这样的复杂系统来说, 如果解决一些基本问题, 这种松散搭的架构可以导致一个更可扩展、更可靠、更具有复原力的系统。 在实验了不同的AWS环境和部署方法之后, 我们于2017年12月决定用Kubernetes作为我们的集装箱管弦。 定义正确建立Kubernetes 的最佳战略已经证明具有挑战性: 自动缩放服务和负载平衡交通可能带来错误, 其起源是难以识别、 监测和记录跨多层活动, 需要仔细处理, 而持续整合和交付系统的最佳工作流程并非不言自明。 我们在这里介绍如何应对这些挑战和我们的未来计划。