Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System

May 22, 2026 · 4:00 AM UTC ·2 min read · 0 reactions · 0 comments · 13 views

#security #artificial intelligence #document classification

⚡ TL;DR · AI summary

A new study presents TorchSight, an open-source local system for security document classification. Built around a fine-tuned Qwen 3.5 model, it achieved high accuracy in categorizing sensitive documents while keeping data processing local. The model outperformed commercial alternatives, demonstrating its potential for organizations needing secure document handling.

Key facts

▪TorchSight is an open-source system designed for security document classification.
▪The model was trained on 78,358 samples and achieved 95.0% category-level accuracy in evaluations.
▪It outperformed commercial models, which scored between 75.4% and 79.9% under the same conditions.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Cryptography and Security arXiv:2605.20368 (cs) [Submitted on 19 May 2026] Title:Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System Authors:Ivan Dobrovolskyi View a PDF of the paper titled Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System, by Ivan Dobrovolskyi View PDF Abstract:Organizations that scan documents for sensitive information face a practical problem. Cloud services require data to be sent to external infrastructure, while rule-based tools often miss threats that depend on context. This study presents TorchSight, an open-source local system for security document classification built around a fine-tuned Qwen 3.5 27B model.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System

Discussion

More from arXiv cs.AI