Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies
A recent study challenges the notion that self-training in language models leads to a flattening of language. Instead, it suggests that language is restructured, with surface markers increasing while deeper syntactic structures diminish. This phenomenon is formalized as the Structural Depth Hypothesis, highlighting the complex dynamics of language evolution in AI models.
- ▪Self-training on language models results in the restructuring of language rather than its flattening.
- ▪Surface markers like discourse connectives increase, while deeper syntactic structures collapse.
- ▪The Structural Depth Hypothesis predicts that the decay rate of linguistic features is primarily influenced by their structural depth.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.20602 (cs) [Submitted on 20 May 2026] Title:Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies Authors:Ming Liu View a PDF of the paper titled Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies, by Ming Liu View PDF HTML (experimental) Abstract:Successive self-training on a language model's own outputs is widely characterized as a process of flattening: diversity drops, distributions narrow, and the text becomes "more like itself." We provide evidence that this characterization is incomplete.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.