System call filtering is a widely used security mechanism for protecting a shared OS kernel against untrusted user applications. However, existing system call filtering techniques either are too expensive due to the context switch overhead imposed by userspace agents, or lack sufficient programmability to express advanced policies. Seccomp, Linux's system call filtering module, is widely used by modern container technologies, mobile apps, and system management services. Despite the adoption of the classic BPF language (cBPF), security policies in Seccomp are mostly limited to static allow lists, primarily because cBPF does not support stateful policies. Consequently, many essential security features cannot be expressed precisely and/or require kernel modifications. In this paper, we present a programmable system call filtering mechanism, which enables more advanced security policies to be expressed by leveraging the extended BPF language (eBPF). More specifically, we create a new Seccomp eBPF program type, exposing, modifying or creating new eBPF helper functions to safely manage filter state, access kernel and user state, and utilize synchronization primitives. Importantly, our system integrates with existing kernel privilege and capability mechanisms, enabling unprivileged users to install advanced filters safely. Our evaluation shows that our eBPF-based filtering can enhance existing policies (e.g., reducing the attack surface of early execution phase by up to 55.4% for temporal specialization), mitigate real-world vulnerabilities, and accelerate filters.
翻译:系统呼叫过滤器是一种广泛使用的安全机制,用来保护共享的OS内核,防止不被信任的用户应用程序。然而,现有的系统呼叫过滤技术要么由于用户空间代理商强加的环境开关管理管理费用太昂贵,要么由于用户空间代理商强加的环境开关管理费用太昂贵,或者缺乏足够的程序性来表达先进的政策。Seccomp, Linux的系统呼叫过滤模块,被现代集装箱技术、移动应用程序和系统管理服务广泛使用。尽管采用了传统的BPF语言(cBPF),但Seccomp的安全政策大多限于静态允许清单,主要因为CBPF不支持州政策。因此,许多基本的安全功能无法准确表达和/或需要内核修改。在本文件中,我们提出了一个可编程系统呼叫过滤机制,通过利用扩展的BPFPF语言(eBPF),使更先进的系统用户能够以更先进的方式表达出更先进的安全政策。我们系统的系统能够通过升级的升级程序,通过升级的系统来提升我们现有的LOPFS-BFS的早期评估机制。