#vllm — Tagged Stories | WeSearch Press

Every story in the WeSearch catalog tagged with #vllm, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

6 stories tagged with #vllm, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag → or search "Vllm"

RELATED TAGS

#disaggregated-serving1 #mamba1 #state-space-models1 #rdma1

DEV.TO (TOP)

Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs

Serving code LLMs at production scale is 3.2x more expensive than general-purpose LLMs when using...…

5 views · Wed, 29 Apr 2026 04:34:24 GMT

GOOGLE DOCS

vLLM-Compile: Bringing Compiler Optimizations to LLM Inference

vLLM-compile: Bringing Compiler Optimizations to LLM Inference Luka Govedič vLLM Committer Senior Machine Learning Engineer, Red Hat 1…

6 views · Wed, 29 Apr 2026 01:52:36 GMT

VERCEL

Disaggregated Serving for Hybrid SSM Models in vLLM

Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…

5 views · Tue, 28 Apr 2026 20:46:24 GMT

#disaggregated serving #mamba

LOCALLLAMA

Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

7 views · Mon, 27 Apr 2026 15:38:07 GMT

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from yesterday and delivered a whopping 100+ tps (TG). Model: - MTP suppor…

10 views · Sun, 26 Apr 2026 22:44:09 GMT

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: Can follow the same recipe I used for Qwen3.5-27B to achieve ~80 tps on a single RTX 5090 at 218k…

10 views · Sun, 26 Apr 2026 08:59:43 GMT

Browse more

All tags Search "Vllm" RSS feed World US Technology Markets

Vllm coverage.