Input constraints are useful for many software development tasks. For example, input constraints of a function enable the generation of valid inputs, i.e., inputs that follow these constraints, to test the function deeper. API functions of deep learning (DL) libraries have DL specific input constraints, which are described informally in the free form API documentation. Existing constraint extraction techniques are ineffective for extracting DL specific input constraints. To fill this gap, we design and implement a new technique, DocTer, to analyze API documentation to extract DL specific input constraints for DL API functions. DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions. These rules are then applied to a large volume of API documents in popular DL libraries to extract their input parameter constraints. To demonstrate the effectiveness of the extracted constraints, DocTer uses the constraints to enable the automatic generation of valid and invalid inputs to test DL API functions. Our evaluation on three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that the precision of DocTer in extracting input constraints is 85.4%. DocTer detects 94 bugs from 174 API functions, including one previously unknown security vulnerability that is now documented in the CVE database, while a baseline technique without input constraints detects only 59 bugs. Most (63) of the 94 bugs are previously unknown, 54 of which have been fixed or confirmed by developers after we report them. In addition, DocTer detects 43 inconsistencies in documents, 39 of which are fixed or confirmed.
翻译:输入限制对于许多软件开发任务是有用的。 例如, 函数的输入限制使得能够生成有效的输入, 即遵循这些限制的输入限制, 以测试函数。 深层学习( DL) 库的 API 函数具有 DL 特定输入限制, 以自由 API 文档的形式非正式地描述这些限制。 现有的限制提取技术对于提取 DL 特定输入限制是无效的。 为了填补这一空白, 我们设计并使用一种新的技术, DocTer 来分析 API 文件, 以提取 DL API 函数的 DL 94 特定输入限制。 DocTer 使用一种新型算法, 以自动构建规则, 以便从 API 描述的依赖性剖析树中提取 API 参数限制。 这些规则随后被应用到流行 DL 图书馆的大量 API 文件, 用于提取 DL 特定输入限制。 DocT 的自动生成有效和无效的输入限制, 用于测试 DL API 功能。 我们的三种流行 DL 图书馆( Tenorfor Flow, PyTorch, ) 和 MXNet 的附加 的附加 等 的精确 定义 等 的精确, 等 的 的 的精确性文件, 都用于检测了 的精确性 正确性文件, 。