What's your tps on 3090 + Qwen 3.6 27B in real tasks?

May 2, 2026 · 11:54 AM UTC · 0 reactions · 0 comments · 0 views

via

LocalLlama

I struggle to wrap my head around all this. My goal is local agent to solve low complexity tasks, in the same harness where I would use frontier models. So naturally this means a large context window, because low complexity can mean a simple-ish fix in a large codebase, rather than just generating some nonsense from zero. So initially I went for Tom's turboquant plus fork of llama.cpp (I'm on Windows) with Qwen 3.6 Q4 and IQ4 models and 200k context window. Well it worked, it can read the entire

Original article

LocalLlama

Read full at LocalLlama →

Anonymous · no account needed

Discussion

0 comments

What's your tps on 3090 + Qwen 3.6 27B in real tasks?

Discussion

More from LocalLlama