#quantization — Tagged Stories

Every story in the WeSearch catalog tagged with #quantization, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

16 stories tagged with #quantization, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag → or search "Quantization"

RELATED TAGS

#ml7 #ai7 #vector-database2 #postgresql1 #ai-ml1 #data-compression1 #bias1 #model-evaluation1 #language-models1 #technology1 #elasticsearch1 #data-processing1

DEV.TO (TOP)

Why your quantized LLM loses its MTP heads and how to keep them

Quantizing a model with multi-token prediction heads? Here's why standard conversion pipelines drop them silently, and how to preserve and calibrate them.…

12 views · Wed, 27 May 2026 16:08:01 GMT

#machinelearning #llm

ARXIV CS.AI

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantiz…

17 views · Wed, 27 May 2026 04:07:56 GMT

#artificial intelligence #text-to-video

ARXIV CS.AI

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but tha…

19 views · Wed, 27 May 2026 04:07:56 GMT

#machine learning #artificial intelligence

R/LOCALLLAMA

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

19 views · Mon, 25 May 2026 12:07:40 GMT

R/LOCALLLAMA

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

16 views · Mon, 25 May 2026 08:37:40 GMT

ELASTICSEARCH LABS

Preconditioning Vectors: Making Elasticsearch VectorDB BBQ Work for Every Vector

Learn when to use vector preconditioning to improve recall for Better Binary Quantization (BBQ) in Elasticsearch…

15 views · Fri, 22 May 2026 07:02:00 GMT

#elasticsearch #vector database

ARXIV CS.AI

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

Large language models (LLMs) are increasingly deployed on mobile devices, where Neural Processing Units (NPUs) necessitate fully static quantization for optimal inference efficienc…

16 views · Fri, 22 May 2026 04:02:00 GMT

#machine learning #artificial intelligence

ARXIV CS.AI

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degrad…

11 views · Fri, 22 May 2026 04:02:00 GMT

#machine learning #artificial intelligence

HACKER NEWS (AI / LLM)

3.125-Bit LLM quantization bypassing tensor cores

By trading heavy FP16 MatMuls for SRAM lookups and 1-bit additions, our custom quantization pipeline squeezes state-of-the-art models down to approx. 3 bits per weight with minimal…

12 views · Thu, 21 May 2026 11:01:10 GMT

#ai #technology

R/STABLEDIFFUSION

ggufy: easy quantization for the GPU poor

13 views · Thu, 21 May 2026 01:35:06 GMT

VENTUREBEAT

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

22 views · Wed, 20 May 2026 21:20:04 GMT

ARXIV CS.AI

Theory-optimal Quantization Based on Flatness

Post-training quantization has emerged as a widely adopted technique for compressing and accelerating the inference of Large Language Models (LLMs). The primary challenges in LLMs …

11 views · Wed, 20 May 2026 04:04:59 GMT

#machine learning #artificial intelligence

GITHUB

Evaluation of Various MLX Quantizations

Utilities to evaluate MLX quantizations. Contribute to deepsweet/mlx-eval development by creating an account on GitHub.…

15 views · Mon, 18 May 2026 18:04:56 GMT

#machine learning #language models

ARXIV CS.AI

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this c…

13 views · Mon, 18 May 2026 04:04:54 GMT

#machine learning #artificial intelligence #bias

ARXIV CS.AI

PrismQuant: Rate-Distortion-Optimal Vector Quantization for Gaussian-Mixture Sources

For a Gaussian source under mean-squared error (MSE), classical transform coding is rate--distortion (RD) optimal: the Karhunen--Loeve transform (KLT) diagonalizes the covariance, …

15 views · Mon, 18 May 2026 04:04:54 GMT

#information theory #machine learning #artificial intelligence

JONATHAN KATZ

Scalar and Binary Quantization for Pgvector Vector Search and Storage (2024)

Quantization can reduce vector sizes, but how does it impact query performance and quality?…

15 views · Sun, 17 May 2026 01:10:19 GMT

#postgresql #vector-database

Browse more

All tags Search "Quantization" RSS feed World US Technology Markets

Quantization coverage.

Why your quantized LLM loses its MTP heads and how to keep them

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

Preconditioning Vectors: Making Elasticsearch VectorDB BBQ Work for Every Vector

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

3.125-Bit LLM quantization bypassing tensor cores

ggufy: easy quantization for the GPU poor

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Theory-optimal Quantization Based on Flatness

Evaluation of Various MLX Quantizations

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

PrismQuant: Rate-Distortion-Optimal Vector Quantization for Gaussian-Mixture Sources

Scalar and Binary Quantization for Pgvector Vector Search and Storage (2024)

Browse more