Constrained text generation remains a challenging task, particularly when dealing with hard constraints. Traditional NLP approaches prioritize generating meaningful and coherent output. Also, the current state-of-the-art methods often lack the expressiveness and constraint satisfaction capabilities to handle such tasks effectively. Recently, an approach for generating constrained sentences in CP has been proposed in (Bonlarron et al, 2023). This ad-hoc model to solve the sentences generation problem under MNREAD rules proved neithertheless to be computationaly and structuraly unsuitable to deal with other more constrained problems. In this paper, a novel more generic approach is introduced to tackle many of these previously untractable problems, and illustrated here with the quite untractable sentences generation problem following RADNER rules. More precisely, this paper presents the CPTextGen Framework. This framework considers a constrained text generation problem as a discrete combinatorial optimization problem. It is solved by a constraint programming method that combines linguistic properties (e.g., n-grams or language level) with other more classical constraints (e.g., the number of characters, syllables). Eventually, a curation phase allows for selecting the best-generated sentences according to perplexity using an LLM. The effectiveness of this approach is demonstrated by tackling a new, more tediously constrained text generation problem: the iconic RADNER sentences problem. This problem aims to generate sentences respecting a set of quite strict rules defined by their use in vision and clinical research. Thanks to our CP-based approach, many new strongly constrained sentences have been successfully generated. This highlights our approach's potential to handle unreasonably constrained text generation scenarios.
翻译:暂无翻译