WeSearch
Hub / Tags / Llama
TAG · #LLAMA

Llama coverage.

Every story in the WeSearch catalog tagged with #llama, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

29 stories tagged with #llama, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Llama"

RELATED TAGS
#ollama6#ai3#claude-code2#meta2#llama-32#android2#kotlin2#qwen3-6-27b1#coding-model1#open-weight1#llama-cpp1#quantization1
R/NODE

Show HN-style: Blue Arrow – modular orchestration system with state-driven execution, local LLaMA integration and post-execution verification

1 view ·
DEV.TO (TOP)

Normalized Categories: One Filter for "Polos" Across Every Supplier

If you've ever tried to search "polos under $10 in navy" across more than one supplier, you already...…

4 views ·
#api#python#ai
THE NEW STACK

Meta abandons open-source Llama for proprietary Muse Spark

Meta has shifted from Llama to its new proprietary AI model Muse Spark, leaving open-source developers searching for alternatives and migration paths.…

7 views ·
#meta#muse spark
DEV.TO (TOP)

Mastering On-Device GenAI: How to Fine-Tune LLMs for Android Using LoRA and Kotlin 2.x

The dream of a truly personal AI—one that lives entirely on your smartphone, understands your medical...…

3 views ·
#android#kotlin#ai
LOCALLLAMA

llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged

And somehow we already got some GGUFs for it! (the below one is from PR author himself)…

7 views ·
GITHUB

Pebble – Menu-bar text polisher running on local Ollama

Menu-bar text-polish tool that rewrites your clipboard with a local Ollama model. One global shortcut, seven presets, no cloud. - gashiartim/pebble…

8 views ·
#pebble#ollama#local llm
ARTIFICIAL INTELLIGENCE (AI)

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stuff that slips past everything else. Arc Gat…

8 views ·
LOCALLLAMA

convert : add support for Nemotron Nano 3 Omni by danbev · Pull Request #22481 · ggml-org/llama.cpp

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcript…

8 views ·
PROMPTENGINEERING

Arc Gate — LLM proxy that catches 100% of indirect/roleplay prompt injection attacks (beats OpenAI Moderation and LlamaGuard)

Built an LLM proxy that sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked against OpenAI Moderation API and Llam…

10 views ·
DEV COMMUNITY

Step-by-Step Guide to Building RAG with LlamaIndex 0.10 and Vector 0.4 for Docs Search

80% of engineering teams building RAG pipelines for internal documentation search waste 3+ weeks...…

5 views ·
DEADNET

Show HN: DeadNet – Watch AI agents debate, play games, and write stories live

DeadNet is a live arena where AI agents debate, play games, and write stories while humans watch and vote. Watch matches or build your own agent.…

16 views ·
#ai agents#live platform#debate
PYTORCH

A Primer on LLM Post-Training

8 views ·
#post-training#large language models#alignment
LOCALLLAMA

Duality of r/LocalLLaMA

8 views ·
DEV COMMUNITY

Step-by-Step Guide to Setting Up Local AI Code Review with Continue.dev 0.9, Ollama 0.5, and ESLint 9

82% of engineering teams report that cloud-based AI code review tools leak sensitive IP, cost 4x more...…

7 views ·
WILLIAMANGEL

Offline Agentic Coding

Offline Agentic Coding: Ollama and Claude code…

5 views ·
#ai#llms#agents
REDDIT

VRAM.cpp: Running llama-fit-params directly in your browser

Lots of people are always asking on this subreddit if their system can run a certain model. A lot of the "VRAM calculators" that I've found only provide either very rough estimates…

10 views ·
REDDIT

Intel B70: LLama.ccp SYCL vs LLama.cpp OpenVino vs LLM-Scaler

In case anyone is interested, I decided to test out LLama.cpp's new OpenVino backend to see how it compares on Intel GPUs. At first glance, it stomps all over the previous best-cas…

8 views ·
GITHUB

The cost math behind routing Claude Code through Ollama (~90% cut)

Pair Claude Desktop on Anthropic with Claude Code routed through Ollama. Visual walkthrough + copy-paste prompt that cuts your Claude Code bill ~90%. - Coherence-Daddy/use-ollama-t…

9 views ·
#claude-code#ollama#cost-optimization
LOCALLLAMA

mesa PR with 37-130% llama.cpp pp perf gain for vulkan on Linux on Intel Xe2

11 views ·
SIMON WILLISON'S WEBLOG

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model Big claims from Qwen about their latest open weight model: Qwen3.6-27B delivers flagship-level agentic coding performance, s…

13 views ·
#qwen3.6-27b#coding model#open-weight
REDDIT

r/LocalLLaMa Rule Updates

As the sub has grown (and as AI based tools have gotten better) with over 1M weekly visitors , we've seen a marked increase in slop, spam etc. This has been on the mod team's mind …

15 views ·
REDDIT

Using PaddleOCR-VL-1.5 with llama-server for book OCR

I've been running PaddleOCR-VL-1.5 via llama.cpp's server for OCR on book pages. It handles complex layouts, tables, and mixed text/figure pages surprisingly well. Setup: - Model: …

14 views ·
REDDIT

Benchmark: Windows 11 vs Lubuntu 26.04 on Llama.cpp (RTX 5080 + i9-14900KF). I didn't expect the gap to be this big.

UPDATE: Vulkan benches arew now included. And yes, I used AI to help me write this post. As a life-long Windows user (don't hate me, I was exposed to it at a young age) I was wonde…

12 views ·
REDDIT

llama.cpp DeepSeek v4 Flash experimental inference

Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even qu…

11 views ·
REDDIT

Will llama.cpp multislot improve speed?

I've heard mostly bad opinions about multiple slots with llama.cpp (--parallel > 1). I guess comparing to vLLM it might be worse at this, but I recently tried vLLM on 4 slots and i…

10 views ·
REDDIT

Experts-Volunteers needed for Vulkan on ik_llama.cpp

ik_llama.cpp is great for both CPU & CUDA. Need legends to make Vulkan better as well. So, after bringing the Vulkan back-end up to speed some time ago, I felt that I simply don't …

9 views ·
REDDIT

This is where we are right now, LocalLLaMA

the future is now…

11 views ·
REDDIT

CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

CUDA prompt processing speedup on MoE check this…

10 views ·
REDDIT

FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally

Both llama.cpp and ik_llama.cpp now have FP4 support — but with different flavors worth knowing about. llama.cpp recently merged NVFP4 (Nvidia's block-scaled FP4, `GGML_TYPE_NVFP4 …

15 views ·