We present DIO, a generic tool for observing inefficient and erroneous I/O interactions between applications and in-kernel storage systems that lead to performance, dependability, and correctness issues. DIO facilitates the analysis and enables near real-time visualization of complex I/O patterns for data-intensive applications generating millions of storage requests. This is achieved by non-intrusively intercepting system calls, enriching collected data with relevant context, and providing timely analysis and visualization for traced events. We demonstrate its usefulness by analyzing two production-level applications. Results show that DIO enables diagnosing resource contention in multi-threaded I/O that leads to high tail latency and erroneous file accesses that cause data loss.
翻译:我们提出了 DIO,一种通用工具,用于观察应用程序和内核存储系统之间低效和错误的 I/O 交互,这些交互会导致性能、可靠性和正确性问题。通过非侵入式地拦截系统调用、丰富收集的数据的相关上下文,并为跟踪事件提供及时的分析和可视化,DIO 促进了分析和近乎实时的成百上千万的存储请求的复杂 I/O 模式的可视化。我们通过分析两个生产级应用程序展示了其实用性。结果表明,DIO 可以诊断多线程 I/O 中的资源争用,导致高尾延迟,并导致数据丢失的错误文件访问。