#qwen3 — Tagged Stories | WeSearch Press

Every story in the WeSearch catalog tagged with #qwen3, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

25 stories tagged with #qwen3, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag → or search "Qwen3"

RELATED TAGS

#ollama2 #claude2 #qwen3-6-27b1 #coding-model1 #open-weight1 #llama-cpp1 #quantization1 #ai1 #llms1 #agents1 #local-models1 #coding1

LOCALLLAMA

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compar…

13 views · Fri, 01 May 2026 22:06:49 GMT

Qwen3.6-35B-A3B KLDs - INTs and NVFPs

KLD for INTs and NVFP4s. AS ALWAYS - Use Case is important. Accuracy versus speed versus native kernels on your GPUs. Things to note again: This is done in VLLM, with REAL logits. …

13 views · Fri, 01 May 2026 22:06:49 GMT

PRISMML

Bonsai: The First Commercially Viable 1-Bit LLM

Today, we are announcing 1-bit Bonsai models that bring advanced intelligence to the devices where people actually live and work.…

6 views · Fri, 01 May 2026 16:20:24 GMT

#ai efficiency #edge computing #model compression

LOCALLLAMA

Got DFlash speculative decoding working on Qwen3.5-35B-A3B with an RTX 2080 SUPER 8GB

## Got DFlash speculative decoding working on Qwen3.5-35B-A3B with an RTX 2080 SUPER 8GB I managed to get **DFlash speculative decoding** working in llama.cpp on a pretty VRAM-limi…

4 views · Fri, 01 May 2026 13:12:11 GMT

LOCALLLAMA

Qwen3.6-27B - Closed-loop SVG Images

Yesterday, I saw an impressive presentation of Qwen 3.6 27B's SVG capabilities on the sub . To maximize the model's capabilities in terms of SVG generation, I put together a closed…

3 views · Fri, 01 May 2026 13:12:10 GMT

XDA

I replaced ChatGPT and Claude with this powerful local LLM and saved over $20 a month while gaining full control

Qwen3.6 runs on my old GPU and does what ChatGPT does for free…

6 views · Fri, 01 May 2026 12:37:29 GMT

#local llm #ai privacy #cost savings

R/LOCALLLAMA

Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)

5 views · Thu, 30 Apr 2026 20:27:50 GMT

X (FORMERLY TWITTER)

Post-trained Qwen3-Coder with a debugger: 70% → 89% solve rate, 59% fewer turns

8 views · Tue, 28 Apr 2026 20:46:24 GMT

LOCALLLAMA

[7900XT] Qwen3.6 27B for OpenCode

I'm just looking for some advice on optimally setting up Qwen3.6 27B for OpenCode. The VRAM is a little bit scarce, but I ended up with this so far: llama-server --model models/Qwe…

10 views · Tue, 28 Apr 2026 08:31:43 GMT

WILLIAMANGEL

Offline Agentic Coding

Offline Agentic Coding: Ollama and Claude code…

5 views · Tue, 28 Apr 2026 01:54:30 GMT

#ai #llms #agents

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B

Hi folks, Enjoy an optimised Qwen3.6 35B-A3B and Qwen3.6 27B for coding and general purpose - it's able to solve puzzles correctly more often too. The initial intent was to optimis…

9 views · Mon, 27 Apr 2026 16:54:14 GMT

Used a Claude Code skill to fine-tune Qwen3-1.7B from 327 noisy traces, matches GLM-5

Had 327 production traces from a restaurant-reservation agent I wanted to retrain. The plan was to fine-tune a smaller self-hostable model so I could ditch the frontier-API bill. T…

12 views · Mon, 27 Apr 2026 16:54:13 GMT

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 …

46 views · Mon, 27 Apr 2026 16:54:13 GMT

LOCALLLAMA

Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

8 views · Mon, 27 Apr 2026 15:38:07 GMT

Qwen3.6-27B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.

Just dropped a 3bit mixed quant (5bit for embeds and prediction layers) for Mac users. There was only one 3 bit version of this model (from Unsloth), but it was very heavy and pain…

10 views · Mon, 27 Apr 2026 11:28:02 GMT

Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems t…

11 views · Mon, 27 Apr 2026 11:28:02 GMT

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

A bit of context. I was coding up a little html tower defense game where you can alter the path by placing additional waypoints. My setup: 32gb ram with 16gb vram 5070 ti. Using Ae…

12 views · Mon, 27 Apr 2026 06:01:11 GMT

SIMON WILLISON'S WEBLOG

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model Big claims from Qwen about their latest open weight model: Qwen3.6-27B delivers flagship-level agentic coding performance, s…

14 views · Sun, 26 Apr 2026 22:44:22 GMT

#qwen3.6-27b #coding model #open-weight

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from yesterday and delivered a whopping 100+ tps (TG). Model: - MTP suppor…

10 views · Sun, 26 Apr 2026 22:44:09 GMT

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!

Been using this for a few days. It is BY FAR the best uncensored model I have found for Qwen 3.6 35B. With IQ4XS, Q8 KVcache, 262K context, it fits in 24GB of VRAM and does not fai…

10 views · Sun, 26 Apr 2026 22:44:08 GMT

Qwen3.5/3.6 Coder?

With practically all of LocalLlama glazing Qwen 3.5/3.6 for it's coding skills. Along with the fact that Alibaba themselves are focusing on making Qwen a reliable coding agent, doe…

93 views · Sun, 26 Apr 2026 19:59:59 GMT

[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage

Config CtxSize: 131,072 GpuLayers: 99 CpuMoeLayers: 38 Threads: 16 BatchSize/UBatchSize: 4096/4096 CacheType K/V: q8_0 Tool Context: file mode (tools.kilocode.official.md) Metric M…

8 views · Sun, 26 Apr 2026 19:59:58 GMT

Browse more

All tags Search "Qwen3" RSS feed World US Technology Markets

Qwen3 coverage.

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

Qwen3.6-35B-A3B KLDs - INTs and NVFPs

Bonsai: The First Commercially Viable 1-Bit LLM

Got DFlash speculative decoding working on Qwen3.5-35B-A3B with an RTX 2080 SUPER 8GB

Qwen3.6-27B - Closed-loop SVG Images

I replaced ChatGPT and Claude with this powerful local LLM and saved over $20 a month while gaining full control

Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)

Post-trained Qwen3-Coder with a debugger: 70% → 89% solve rate, 59% fewer turns

[7900XT] Qwen3.6 27B for OpenCode

Offline Agentic Coding

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B

Used a Claude Code skill to fine-tune Qwen3-1.7B from 327 noisy traces, matches GLM-5

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

Qwen3.6-27B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.

Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!

Qwen3.5/3.6 Coder?

[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Qwen3.6 35b a3b Particle System

Qwen3.5/3.6 Coder?

Browse more