Help a fellow dev on AI-localization?
Opening excerpt (first ~120 words) tap to expand
We built an AI-based localization pipeline for our software product (HR domain) and would love feedback/ suggestions from others working in production MT/localization, so that we can learn and improve.Current methodology:GPT-5-nano forward translation + back-translationtext-embedding-3-small cosine similarity on source vs. back-translated text.Threshold: ≥0.92 = auto-approvedOn a recent ~970-string Spanish localization run:~75% of strings passed automaticallyWe then had two human translators review outputs, and both flagged several problematic cases:"Add Attachment" → Agregar AdjuntoBetter: Adjuntar Archivo"Pay Grades" → Grados de PagoBetter: Escalas salariales"Sub Unit" → SubunidadBetter: DepartamentoAll three examples still scored 0.94+ cosine similarity.Google Translate also…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Ycombinator.