Surface-Form Neural Sparse Retrieval: Robust Fuzzy Matching for Industrial Music Search
A new paper presents a robust neural sparse retrieval system aimed at improving music search efficiency. The system addresses challenges posed by user query variations and aims to enhance recall rates while maintaining low latency. Evaluations indicate significant performance improvements over traditional methods, particularly in handling long-tail queries.
- ▪The proposed system achieves a recall rate of 91.4% at the top 10 results, compared to 57.7% for traditional trigrams.
- ▪It utilizes a domain-specific granular subword tokenization strategy to enhance surface-form robustness.
- ▪The approach minimizes online processing to achieve effectively zero latency overhead for query encoding.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.17762 (cs) [Submitted on 18 May 2026] Title:Surface-Form Neural Sparse Retrieval: Robust Fuzzy Matching for Industrial Music Search Authors:Paul Greyson, Zhichao Geng, Wei Zhang, Yang Yang View a PDF of the paper titled Surface-Form Neural Sparse Retrieval: Robust Fuzzy Matching for Industrial Music Search, by Paul Greyson and 3 other authors View PDF HTML (experimental) Abstract:Music search at the scale of Amazon Music presents a unique challenge: queries frequently deviate from indexed metadata due to misspellings, transpositions, and phonetic variations, yet the retrieval system must operate under strict millisecond-level latency constraints.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.