What's your tps on 3090 + Qwen 3.6 27B in real tasks?
·
0 reactions
·
0 comments
·
0 views
I struggle to wrap my head around all this. My goal is local agent to solve low complexity tasks, in the same harness where I would use frontier models. So naturally this means a large context window, because low complexity can mean a simple-ish fix in a large codebase, rather than just generating some nonsense from zero. So initially I went for Tom's turboquant plus fork of llama.cpp (I'm on Windows) with Qwen 3.6 Q4 and IQ4 models and 200k context window. Well it worked, it can read the entire
Original article
LocalLlama
Anonymous · no account needed