The R programming language is widely used in large-scale data analyses. It contains especially rich built-in support for dealing with vectors, arrays, and matrices. These operations feature prominently in the applications that form R's raison d'\^etre, making their behavior worth understanding. Furthermore, ostensibly for programmer convenience, their behavior in R is a notable extension over the corresponding operations in mathematics, thereby offering some challenges for specification and static verification. We report on progress towards statically typing this aspect of the R language. The interesting aspects of typing, in this case, warn programmers about violating bounds, so the types must necessarily be dependent. We explain the ways in which R extends standard mathematical behavior. We then show how R's behavior can be specified in LiquidHaskell, a dependently-typed extension to Haskell. In the general case, actually verifying library and client code is currently beyond LiquidHaskell's reach; therefore, this work provides challenges and opportunities both for typing R and for progress in dependently-typed programming languages.
翻译:R 编程语言被广泛应用于大规模数据分析。它内置了丰富的支持向量、数组和矩阵的处理。这些操作在构成 R 的意义上非常重要,在不了解其行为的情况下很难进行分析。此外,为了方便程序员,它们在 R 中的行为是数学上对应操作的一个扩展,因此提供了一些规范和静态验证的挑战。我们报告了在对 R 语言的这个方面进行静态类型化方面的进展。在这种情况下,类型的感兴趣的方面警告程序员不要违反边界,因此类型必须是依赖的。我们解释了 R 如何扩展标准数学行为。然后我们展示了如何在 LiquidHaskell 中规范 R 的行为,这是一个依赖类型的 Haskell 扩展。在一般情况下,实际验证库和客户端代码目前超出了 LiquidHaskel 的范围。因此,这项工作为 R 的类型化和依赖类型编程语言的进步提供了挑战和机会。