25 stories tagged with #qwen3, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Qwen3"
Quant Qwen3.6-27B on 16GB VRAM with 100k context length
I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compar…
Qwen3.6-35B-A3B KLDs - INTs and NVFPs
KLD for INTs and NVFP4s. AS ALWAYS - Use Case is important. Accuracy versus speed versus native kernels on your GPUs. Things to note again: This is done in VLLM, with REAL logits. …
Bonsai: The First Commercially Viable 1-Bit LLM
Today, we are announcing 1-bit Bonsai models that bring advanced intelligence to the devices where people actually live and work.…
Got DFlash speculative decoding working on Qwen3.5-35B-A3B with an RTX 2080 SUPER 8GB
## Got DFlash speculative decoding working on Qwen3.5-35B-A3B with an RTX 2080 SUPER 8GB I managed to get **DFlash speculative decoding** working in llama.cpp on a pretty VRAM-limi…
Qwen3.6-27B - Closed-loop SVG Images
Yesterday, I saw an impressive presentation of Qwen 3.6 27B's SVG capabilities on the sub . To maximize the model's capabilities in terms of SVG generation, I put together a closed…
I replaced ChatGPT and Claude with this powerful local LLM and saved over $20 a month while gaining full control
Qwen3.6 runs on my old GPU and does what ChatGPT does for free…
Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)
Post-trained Qwen3-Coder with a debugger: 70% → 89% solve rate, 59% fewer turns
[7900XT] Qwen3.6 27B for OpenCode
I'm just looking for some advice on optimally setting up Qwen3.6 27B for OpenCode. The VRAM is a little bit scarce, but I ended up with this so far: llama-server --model models/Qwe…
Offline Agentic Coding
Offline Agentic Coding: Ollama and Claude code…
GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B
Hi folks, Enjoy an optimised Qwen3.6 35B-A3B and Qwen3.6 27B for coding and general purpose - it's able to solve puzzles correctly more often too. The initial intent was to optimis…
Used a Claude Code skill to fine-tune Qwen3-1.7B from 327 noisy traces, matches GLM-5
Had 327 production traces from a restaurant-reservation agent I wanted to retrain. The plan was to fine-tune a smaller self-hostable model so I could ditch the frontier-API bill. T…
Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090
Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 …
Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s
Qwen3.6-27B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.
Just dropped a 3bit mixed quant (5bit for embeds and prediction layers) for Mac users. There was only one 3 bit version of this model (from Unsloth), but it was very heavy and pain…
Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B
Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems t…
Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!
A bit of context. I was coding up a little html tower defense game where you can alter the path by placing additional waypoints. My setup: 32gb ram with 16gb vram 5070 ti. Using Ae…
Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model
Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model Big claims from Qwen about their latest open weight model: Qwen3.6-27B delivers flagship-level agentic coding performance, s…
Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19
Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from yesterday and delivered a whopping 100+ tps (TG). Model: - MTP suppor…
Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!
Been using this for a few days. It is BY FAR the best uncensored model I have found for Qwen 3.6 35B. With IQ4XS, Q8 KVcache, 262K context, it fits in 24GB of VRAM and does not fai…
Qwen3.5/3.6 Coder?
With practically all of LocalLlama glazing Qwen 3.5/3.6 for it's coding skills. Along with the fact that Alibaba themselves are focusing on making Qwen a reliable coding agent, doe…
[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage
Config CtxSize: 131,072 GpuLayers: 99 CpuMoeLayers: 38 Threads: 16 BatchSize/UBatchSize: 4096/4096 CacheType K/V: q8_0 Tool Context: file mode (tools.kilocode.official.md) Metric M…
Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19
Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: Can follow the same recipe I used for Qwen3.5-27B to achieve ~80 tps on a single RTX 5090 at 218k…
Qwen3.6 35b a3b Particle System
Started testing Qwen3.6 35b a3b. I let it code a particle System with my Pi Agent. It just made one little ValueError but I was impressed how fast it got it right. Which task are y…