I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash
·
0 reactions
·
0 comments
·
14 views
Original article
r/LocalLLaMA
Anonymous · no account needed