A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works
Queenie Luo has developed a fine-tuned BERT classifier named Lepton to identify personal-letter titles in Classical Chinese wenji. The model was trained on 5,438 hand-labeled titles from late-Ming and early-Qing literature. It has been deployed on Hugging Face and is used to identify approximately 55,000 letters for the Ming Letter Platform.
- ▪Lepton predicts whether a title in a wenji table of contents is a personal letter or a preface.
- ▪The classifier was fine-tuned on a dataset of 5,438 titles from late-Ming and early-Qing literati.
- ▪It has been utilized to populate the Ming Letter Platform with around 55,000 identified letters.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.23103 (cs) [Submitted on 21 May 2026] Title:A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works Authors:Queenie Luo View a PDF of the paper titled A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works, by Queenie Luo View PDF HTML (experimental) Abstract:I present Lepton (Letter Prediction), a fine-tuned BERT classifier that predicts whether a title in a Classical Chinese wenji table of contents is a personal letter or a closely confusable preface (particularly the farewell-preface). Lepton fine-tunes bert-base-chinese on 5438 hand-labeled wenji titles from thirty-three late-Ming and early-Qing literati.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.