Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
A recent study investigates the effects of quantization on large language models (LLMs) and its impact on bias emergence. The research reveals that 3-bit quantization can lead to a significant percentage of previously unbiased items developing new biases. These findings highlight the inadequacy of standard quality metrics in detecting fairness-critical degradation in compressed models.
- ▪The study examines three instruction-tuned models at five precision levels on a large bias benchmark.
- ▪Results indicate that 3-bit quantization causes 6-21% of unbiased items to exhibit new stereotypical behaviors.
- ▪Standard quality metrics fail to capture these changes, as perplexity increases minimally while bias emergence is significant.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.15208 (cs) [Submitted on 2 May 2026] Title:Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels Authors:Plawan Kumar Rath, Rahul Maliakkal View a PDF of the paper titled Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels, by Plawan Kumar Rath and 1 other authors View PDF HTML (experimental) Abstract:Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this compression on model quality remains poorly understood. Existing studies typically compare only two conditions (full-precision vs.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.