We present POTATO, a task- and languageindependent framework for human-in-the-loop (HITL) learning of rule-based text classifiers using graph-based features. POTATO handles any type of directed graph and supports parsing text into Abstract Meaning Representations (AMR), Universal Dependencies (UD), and 4lang semantic graphs. A streamlit-based user interface allows users to build rule systems from graph patterns, provides real-time evaluation based on ground truth data, and suggests rules by ranking graph features using interpretable machine learning models. Users can also provide patterns over graphs using regular expressions, and POTATO can recommend refinements of such rules. POTATO is applied in projects across domains and languages, including classification tasks on German legal text and English social media data. All components of our system are written in Python, can be installed via pip, and are released under an MIT License on GitHub.
翻译:我们提出POTATO,这是使用图表特征学习基于规则的文本分类人员的任务和语言独立框架。 POTATO处理任何类型的定向图表,支持将文字解析成“抽象含义说明”、“通用依赖”和“4lang 语义图。基于流线的用户界面使用户能够根据图表模式建立规则系统,提供基于地面真实数据的实时评价,并通过使用可解释的机器学习模型的排序图表特征提出规则。用户还可以使用常规表达方式提供图示以外的模式,而POTATO可以建议对此类规则的改进。 POTATO被用于跨领域和语言的项目,包括德国法律文本和英语社会媒体数据的分类任务。我们系统的所有组成部分都用Python书写,可以通过pip书写,并用麻省理学学会的GitHub许可证发布。