WeSearch
Hub / Search / cpp
SEARCH · CPP

Results for "cpp".

24 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

24 results for "cpp"

REDDIT

Intel B70: LLama.ccp SYCL vs LLama.cpp OpenVino vs LLM-Scaler

In case anyone is interested, I decided to test out LLama.cpp's new OpenVino backend to see how it compares on Intel GPUs. At first glance, it stomps all over the previous best-case, SYCL, but lags be…

· 6 views
REDDIT

Benchmark: Windows 11 vs Lubuntu 26.04 on Llama.cpp (RTX 5080 + i9-14900KF). I didn't expect the gap to be this big.

UPDATE: Vulkan benches arew now included. And yes, I used AI to help me write this post. As a life-long Windows user (don't hate me, I was exposed to it at a young age) I was wondering how much (if an…

· 6 views
REDDIT

llama.cpp DeepSeek v4 Flash experimental inference

Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even quantized at 2 bit, lo…

· 6 views
REDDIT

Will llama.cpp multislot improve speed?

I've heard mostly bad opinions about multiple slots with llama.cpp (--parallel > 1). I guess comparing to vLLM it might be worse at this, but I recently tried vLLM on 4 slots and it indeed improved th…

· 6 views
REDDIT

Experts-Volunteers needed for Vulkan on ik_llama.cpp

ik_llama.cpp is great for both CPU & CUDA. Need legends to make Vulkan better as well. So, after bringing the Vulkan back-end up to speed some time ago, I felt that I simply don't have the bandwidth t…

· 6 views
REDDIT

FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally

Both llama.cpp and ik_llama.cpp now have FP4 support — but with different flavors worth knowing about. llama.cpp recently merged NVFP4 (Nvidia's block-scaled FP4, `GGML_TYPE_NVFP4 = 40`), with CUDA ke…

· 11 views
THE GLOBE AND MAIL

Highlights from the spring economic update, including CPP contribution cuts and new sports funding

· 0 views
LOCALLLAMA

convert : add support for Nemotron Nano 3 Omni by danbev · Pull Request #22481 · ggml-org/llama.cpp

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document in…

· 3 views
THE GLOBE AND MAIL

CPPIB among investors looking to sell down stakes in India’s NSE IPO, sources say

Shareholders including the Canadian pension plan, LIC, SBI, Temasek Holdings and Morgan Stanley will offload a 5% stake, according to the sources…

· 3 views
REDDIT

VRAM.cpp: Running llama-fit-params directly in your browser

Lots of people are always asking on this subreddit if their system can run a certain model. A lot of the "VRAM calculators" that I've found only provide either very rough estimates or are severely lim…

· 7 views
LOCALLLAMA

mesa PR with 37-130% llama.cpp pp perf gain for vulkan on Linux on Intel Xe2

· 6 views
REDDIT

CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

CUDA prompt processing speedup on MoE check this…

· 6 views
LOCALLLAMA

Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max

Took TheTom's TurboQuant Metal fork of llama.cpp (github.com/TheTom/llama-cpp-turboquant, the feature/turboquant-kv-cache branch) and ran a depth sweep on Qwen 3.6-35B-A3B Q8. TheTom had already publi…

· 2 views
REDDIT

AMD Radeon RX 6900 XT - ROCm vs Vulkan - Gemma 4 and Qwen 3.5 speed benchmarks

Did some quick tests after building llama.cpp with ROCm 6.4.2 and latest Vulkan for my 6900 XT gemma4 E2B Q4_K ubatch ROCm pp512 Vulkan pp512 ROCm tg128 Vulkan tg128 32 1536.60 1423.49 151.92 174.59 6…

· 5 views
LOCALLLAMA

Most efficient way of running Gemma 4 E4B with multimodal capabilities on a laptop?

The gemma 4 E4B and E2B models have built-in multimodal capabilities. However, as far as I am aware, llama.cpp does not have proper support for vision and audio inputs (specially audio) for these mode…

· 5 views
REDDIT

How to run a local coding agent with Gemma 4 and Pi | Patrick Loeber

Tutorial from the Google guy, I use very similar setup (llama.cpp instead of lmstudio)…

· 11 views
REDDIT

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B

Hi folks, Enjoy an optimised Qwen3.6 35B-A3B and Qwen3.6 27B for coding and general purpose - it's able to solve puzzles correctly more often too. The initial intent was to optimise the 35B-A3B reason…

· 6 views
REDDIT

Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems to provide a nice spe…

· 8 views
LOCALLLAMA

What is the best coding agent (CLI) like Claude Code for Local Development

Hey all: I am trying to set up claude code to work with llama.cpp, I am using the Qwen3.6-35B-A3B. I usually use claude code + ZLM subscription i got lucky with $30 yearly - the set up is very simple …

· 8 views
REDDIT

Using PaddleOCR-VL-1.5 with llama-server for book OCR

I've been running PaddleOCR-VL-1.5 via llama.cpp's server for OCR on book pages. It handles complex layouts, tables, and mixed text/figure pages surprisingly well. Setup: - Model: PaddleOCR-VL-1.5-GGU…

· 9 views
LOCALLLAMA

Is there a way to mitigate performance as context grows?

In my local LLM setup I get from 30 to 80 t/s generation at the beginning, but it drops quite a lot as context grows. I use llama.cpp/Vulkan with an MI50 and a V100, is there some command line flags t…

· 6 views
LOCALLLAMA

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compared the mean KLD of i…

· 5 views
REDDIT

Field report: coding with Qwen 3.6 35B-A3B on an M2 Macbook Pro with 32GB RAM

TL;DR: I finally have this working and doing real work within the tight specs of my 32GB RAM Mac. So for those who would like to fly like Julien Chaumond , here's an updated HOW-TO, an explanation of …

· 6 views
REDDIT

your daily driver stack, what's it look like? and why?

What it says in the title, I'm interested in hearing what you all have landed on as a workable / useful stack for you. Mine looks like this: back end inference servers - llama.cpp, vLLM | V hermes-age…

· 6 views