Search: "qwen3 6 27b" — WeSearch Press

LOCALLLAMA

[7900XT] Qwen3.6 27B for OpenCode

I'm just looking for some advice on optimally setting up Qwen3.6 27B for OpenCode. The VRAM is a little bit scarce, but I ended up with this so far: llama-server --model models/Qwen3.6-27B-IQ4_XS.gguf…

Tue, 28 Apr 2026 08:31:43 GMT · 4 views

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B

Hi folks, Enjoy an optimised Qwen3.6 35B-A3B and Qwen3.6 27B for coding and general purpose - it's able to solve puzzles correctly more often too. The initial intent was to optimise the 35B-A3B reason…

Mon, 27 Apr 2026 16:54:14 GMT · 4 views

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts t…

Mon, 27 Apr 2026 16:54:13 GMT · 40 views

Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems to provide a nice spe…

Mon, 27 Apr 2026 11:28:02 GMT · 6 views

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

A bit of context. I was coding up a little html tower defense game where you can alter the path by placing additional waypoints. My setup: 32gb ram with 16gb vram 5070 ti. Using AesSedai/Qwen3.6-35B-A…

Mon, 27 Apr 2026 06:01:11 GMT · 8 views

SIMON WILLISON'S WEBLOG

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model Big claims from Qwen about their latest open weight model: Qwen3.6-27B delivers flagship-level agentic coding performance, surpassing the previo…

Sun, 26 Apr 2026 22:44:22 GMT · 8 views

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from yesterday and delivered a whopping 100+ tps (TG). Model: - MTP supported - KLD is decent …

Sun, 26 Apr 2026 22:44:09 GMT · 5 views

LOCALLLAMA

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compared the mean KLD of i…

Sun, 26 Apr 2026 11:28:18 GMT · 4 views

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: Can follow the same recipe I used for Qwen3.5-27B to achieve ~80 tps on a single RTX 5090 at 218k context window via …

Sun, 26 Apr 2026 08:59:43 GMT · 4 views

LOCALLLAMA

Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

Mon, 27 Apr 2026 15:38:07 GMT · 2 views

Results for "qwen3 6 27b".

[7900XT] Qwen3.6 27B for OpenCode

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

Qwen3.6-27B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.

Qwen 3.6 27B in Claude Code says it will do something then stops and prompts for user reply (not failing a tool call)

Qwen3.6 35b a3b Particle System

Qwen 3.6 35b a3b Q4 vs qwen 3.6 27b q6, on m5 pro 64gb

Or browse by topic

Results for "qwen3 6 27b".

[7900XT] Qwen3.6 27B for OpenCode

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

Qwen3.6-27B-3bit-mlx · Hugging Face: 3 &amp; 5 mixed quant for RAM poor Mac users.

Qwen 3.6 27B in Claude Code says it will do something then stops and prompts for user reply (not failing a tool call)

Qwen3.6 35b a3b Particle System

Qwen 3.6 35b a3b Q4 vs qwen 3.6 27b q6, on m5 pro 64gb

Or browse by topic

Qwen3.6-27B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.