Continuous Diffusion Models Can Obey Formal Syntax
A new method called Diffinity has been introduced to guide continuous diffusion language models in adhering to formal syntactic constraints. This approach utilizes an analytic score to estimate the probability of a latent state decoding to a valid string based on regular expressions. The method has shown high constraint satisfaction rates while maintaining output quality, outperforming traditional autoregressive models.
- ▪Diffusion language models provide a non-causal generation process that can be challenging to constrain.
- ▪The training-free guidance method allows for steering models to satisfy formal syntax without auxiliary classifiers.
- ▪Diffinity achieved 68-96% constraint satisfaction on various benchmarks while incurring minimal perplexity costs.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2602.12468 (cs) [Submitted on 12 Feb 2026 (v1), last revised 27 May 2026 (this version, v2)] Title:Continuous Diffusion Models Can Obey Formal Syntax Authors:Jinwoo Kim, Taylor Berg-Kirkpatrick, Loris D'Antoni View a PDF of the paper titled Continuous Diffusion Models Can Obey Formal Syntax, by Jinwoo Kim and 2 other authors View PDF Abstract:Diffusion language models offer a promising alternative to autoregressive models due to their global, non-causal generation process, but their continuous latent dynamics make discrete constraints -- e.g., the output should be a JSON file that matches a given schema -- difficult to impose.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.