WeSearch
Hub / Tags / Quantization
TAG · #QUANTIZATION

Quantization coverage.

Every story in the WeSearch catalog tagged with #quantization, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

16 stories tagged with #quantization, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Quantization"

RELATED TAGS
#ml7#ai7#vector-database2#postgresql1#ai-ml1#data-compression1#bias1#model-evaluation1#language-models1#technology1#elasticsearch1#data-processing1
DEV.TO (TOP)

Why your quantized LLM loses its MTP heads and how to keep them

Quantizing a model with multi-token prediction heads? Here's why standard conversion pipelines drop them silently, and how to preserve and calibrate them.…

12 views ·
#machinelearning#llm
ARXIV CS.AI

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantiz…

17 views ·
#artificial intelligence#text-to-video
ARXIV CS.AI

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but tha…

19 views ·
#machine learning#artificial intelligence
R/LOCALLLAMA

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

19 views ·
R/LOCALLLAMA

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

16 views ·
ELASTICSEARCH LABS

Preconditioning Vectors: Making Elasticsearch VectorDB BBQ Work for Every Vector

Learn when to use vector preconditioning to improve recall for Better Binary Quantization (BBQ) in Elasticsearch…

15 views ·
#elasticsearch#vector database
ARXIV CS.AI

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

Large language models (LLMs) are increasingly deployed on mobile devices, where Neural Processing Units (NPUs) necessitate fully static quantization for optimal inference efficienc…

16 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degrad…

11 views ·
#machine learning#artificial intelligence
HACKER NEWS (AI / LLM)

3.125-Bit LLM quantization bypassing tensor cores

By trading heavy FP16 MatMuls for SRAM lookups and 1-bit additions, our custom quantization pipeline squeezes state-of-the-art models down to approx. 3 bits per weight with minimal…

12 views ·
#ai#technology
R/STABLEDIFFUSION

ggufy: easy quantization for the GPU poor

13 views ·
VENTUREBEAT

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

22 views ·
ARXIV CS.AI

Theory-optimal Quantization Based on Flatness

Post-training quantization has emerged as a widely adopted technique for compressing and accelerating the inference of Large Language Models (LLMs). The primary challenges in LLMs …

11 views ·
#machine learning#artificial intelligence
GITHUB

Evaluation of Various MLX Quantizations

Utilities to evaluate MLX quantizations. Contribute to deepsweet/mlx-eval development by creating an account on GitHub.…

15 views ·
#machine learning#language models
ARXIV CS.AI

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this c…

13 views ·
#machine learning#artificial intelligence#bias
ARXIV CS.AI

PrismQuant: Rate-Distortion-Optimal Vector Quantization for Gaussian-Mixture Sources

For a Gaussian source under mean-squared error (MSE), classical transform coding is rate--distortion (RD) optimal: the Karhunen--Loeve transform (KLT) diagonalizes the covariance, …

15 views ·
#information theory#machine learning#artificial intelligence
JONATHAN KATZ

Scalar and Binary Quantization for Pgvector Vector Search and Storage (2024)

Quantization can reduce vector sizes, but how does it impact query performance and quality?…

15 views ·
#postgresql#vector-database