Mathematical formulae carry complex and essential semantic information in a variety of formats. Accessing this information with different systems requires a standardized machine-readable format that is capable of encoding presentational and semantic information. Even though MathML is an official recommendation by W3C and an ISO standard for representing mathematical expressions, we could identify only very few systems which use the full descriptiveness of MathML. MathML's high complexity results in a steep learning curve for novice users. We hypothesize that this complexity is the reason why many community-driven projects refrain from using MathML, and instead develop problem-specific data formats for their purposes. We provide a user-friendly, open-source application programming interface for controlling MathML data. Our API is written in JAVA and allows to create, manipulate, and efficiently access commonly needed information in presentation and content MathML. Our interface also provides tools for calculating differences and similarities between MathML expressions. The API also allows to determine the distance between expressions using different similarity measures. In addition, we provide adapters for numerous conversion tools and the canonicalization project. Our toolkit facilitates processing of mathematics for digital libraries, without the need to obtain XML expertise.
翻译:数学公式以多种格式包含复杂和基本的语义信息。 以不同系统访问这些信息需要一种标准化的机器可读格式, 能够对演示和语义信息进行编码。 尽管数学ML是W3C的正式建议,也是代表数学表达的ISO标准,但我们只能确定极少数使用数学ML全面描述的系统。 数学ML的高度复杂结果为新用户提供了一个陡峭的学习曲线。 我们假设,这种复杂性是许多社区驱动的项目不使用数学ML,而为其目的开发问题特定的数据格式的原因。 我们为控制数学ML数据提供了一个方便用户的、开放源应用程序编程界面。 我们的API是在 JAVA 中撰写的, 并允许创建、 操作和高效获取常见的数学MLML信息。 我们的界面也为计算数学ML表达方式之间的差异和相似性提供了工具。 API还允许使用不同的类似措施确定表达方式之间的距离。 此外, 我们为众多的转换工具和计算机化项目提供了适应器。 我们的工具包有利于数字图书馆的数学处理, 不需要获得数字数据库的专门知识 。