Jupyter has become the go-to platform for developing data applications but data and security concerns, especially when dealing with healthcare, have become paramount for many institutions and applications dealing with sensitive information. How then can we continue to enjoy the data analysis and machine learning opportunities provided by Jupyter and the Python ecosystem while guaranteeing auditable compliance with security and privacy concerns? We will describe the architecture and implementation of a cloud based platform based on Jupyter that integrates with Amazon Web Services (AWS) and uses containerized services without exposing the platform to the vulnerabilities present in Kubernetes and JupyterHub. This architecture addresses the HIPAA requirements to ensure both security and privacy of data. The architecture uses an AWS service to provide JSON Web Tokens (JWT) for authentication as well as network control. Furthermore, our architecture enables secure collaboration and sharing of Jupyter notebooks. Even though our platform is focused on Jupyter notebooks and JupyterLab, it also supports R-Studio and bespoke applications that share the same authentication mechanisms. Further, the platform can be extended to other cloud services other than AWS.
翻译:Jupyter已成为开发数据应用的平台,但数据和安全问题,特别是在处理保健问题时,已成为许多处理敏感信息的机构和应用程序的至高无上平台,因此,我们如何能够继续享受Jupyter和Python生态系统提供的数据分析和机器学习机会,同时保证可以审计遵守安全和隐私方面的关切?我们将描述基于Jupyter的云平台的结构和实施,该平台与亚马逊网络服务(AWS)相结合,使用集装箱化服务,而不会暴露平台在Kubernetes和JupyterHub的脆弱之处。这一架构满足了HIPA的要求,以确保数据安全和隐私。该架构利用AWS服务为Json Web Tokens(JWT)提供认证和网络控制。此外,我们的架构能够确保Jupyter笔记本的安全合作和共享。尽管我们的平台侧重于Jupyter笔记本和JupyterLab,但它也支持R-Studio,并能够共享同一认证机制的应用程序。此外,该平台还可以扩展到AWSWS以外的其他云服务。