Performance Analysis of AI Query Approximation Using Lightweight Proxy Models
The paper evaluates an AI query approximation method using lightweight proxy models to reduce costs and latency in executing AI-enhanced database queries. It demonstrates over 100x improvements in cost and latency for semantic filtering while maintaining or improving accuracy across benchmark datasets. The approach is implemented in Google BigQuery and AlloyDB with optimizations for both online and offline processing.
- ▪AI Queries extend SQL with LLM-powered functions for processing structured and unstructured data.
- ▪Lightweight proxy models operating on embedding vectors achieve over 100x cost and latency reduction compared to direct LLM use.
- ▪The method maintains high accuracy and is integrated into OLAP and HTAP database architectures like BigQuery and AlloyDB.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Databases arXiv:2603.15970 (cs) [Submitted on 16 Mar 2026 (v1), last revised 14 Apr 2026 (this version, v6)] Title:100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models Authors:Yeounoh Chung, Rushabh Desai, Jian He, Yu Xiao, Thibaud Hottelier, Yves-Laurent Kom Samo, Pushkar Khadilkar, Xianshun Chen, Sam Idicula, Fatma Özcan, Alon Halevy, Yannis Papakonstantinou View a PDF of the paper titled 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models, by Yeounoh Chung and 11 other authors View PDF HTML (experimental) Abstract:Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.