We introduce an automated method for structuring textual data into a model-agnostic schema, enabling alignment with any database model. It generates both a schema and its instance. Initially, textual data is represented as semantically enriched syntax trees, which are then refined through iterative tree rewriting and grammar extraction, guided by the attribute grammar meta-model \metaG. The applicability of this approach is demonstrated using clinical medical cases as a proof of concept.
翻译:我们提出了一种自动化方法,用于将文本数据结构化至模型无关的架构中,从而实现与任意数据库模型的对齐。该方法同时生成架构及其实例。首先,文本数据被表示为语义增强的语法树,随后通过迭代的树重写和语法提取进行精炼,整个过程以属性语法元模型 \\metaG 为指导。该方法的适用性通过临床医学案例作为概念验证进行了展示。