social · source
r/LocalLLaMA on WeSearch
Recent social headlines from r/LocalLLaMA.
R/LOCALLLAMA
Introducing Gemma 4 12B: a unified, encoder-free multimodal model
R/LOCALLLAMA
Best way to index full Italian Wikipedia for 100% offline RAG in LM Studio?
R/LOCALLLAMA
This day in LLM history….105 years ago today, Qwen 3.6 27b was released open source. /s
R/LOCALLLAMA
Gemma 4 Unified is coming
R/LOCALLLAMA
Take Three: What’s the rub on memory sessions?
R/LOCALLLAMA
ui: Mermaid Diagrams in chat + interactive preview by allozaur · Pull Request #24032 · ggml-org/llama.cpp
R/LOCALLLAMA
Gemma 4 is coming - No Vision Tower - No Audio Tower
R/LOCALLLAMA
I developed a hard LLM Challenge
R/LOCALLLAMA
lipsync possible on mac?
R/LOCALLLAMA
Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.
R/LOCALLLAMA
Half the top 10 trending GitHub repos right now are "skills" projects, not models
R/LOCALLLAMA
Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b
R/LOCALLLAMA
Calling it now Microsoft is buying Unsloth.
R/LOCALLLAMA
Helvete-nano
R/LOCALLLAMA
Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)
R/LOCALLLAMA
Mellum & Granite Embedding models are ready on llama.cpp
R/LOCALLLAMA
Another shout out to llama.cpp build b9455 2x3090
R/LOCALLLAMA
Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models!
R/LOCALLLAMA
Nous Research — Hermes Desktop
R/LOCALLLAMA
Why do we benchmark quants on perplexity and prose but never on tool call validity?
R/LOCALLLAMA
Someone out there likely needs this
R/LOCALLLAMA
Everyone here self-hosts inference. Almost nobody self-hosts the tooling around it. That feels backwards to me.
R/LOCALLLAMA
Cost Analysis of my $6.4k Local LLM Server
R/LOCALLLAMA
Running Qwen 3.6 35b MoE With Zoo Code On M1 Max is Amazing! Fully local, battery-powered coding powerhouse!
R/LOCALLLAMA
Would a MacBook M5 16/24/32GB be an upgrade, complement, or waste next to my RTX 4060 laptop?
R/LOCALLLAMA
What features dramatically improved your custom memory system?
R/LOCALLLAMA
For those creating personal assistants locally - how has short/long term memory impacted your experience?
R/LOCALLLAMA
Parallax: Parameterized Local Linear Attention for Language Modeling
R/LOCALLLAMA
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face
R/LOCALLLAMA
SupraLabs 50M Parameter Model Just Hit the Trending Page on Hugging Face 🤯
R/LOCALLLAMA
Why does Thinking Output More Tokens Than a Response?
R/LOCALLLAMA
[LLM analysis challenge] OPERATION: REVERSE ROBOTOMY. We need an LLM Neurosurgeon to extract a password from a fractured artificial mind.
R/LOCALLLAMA
Can't get over 250TPS on RTX5090 with Qwen3.5-4B
R/LOCALLLAMA
LFM2.5-8B-A1B release
R/LOCALLLAMA
anybody got llama-swap working answering concurrent requests for a single model?
R/LOCALLLAMA
STT -> LLM -> TTS pipeline
R/LOCALLLAMA
Qwen 3.6 coding choice–27B vs 35B quants
R/LOCALLLAMA
"What are you good at?"
R/LOCALLLAMA
Fulloch V2: 100% Local Voice Assistant for Home Assistant & Obsidian (Runs on 16GB VRAM)
R/LOCALLLAMA
MINISFORUM UM790 Pro
R/LOCALLLAMA
Gryphe/Pantheon-Reasoning-27B · Hugging Face
R/LOCALLLAMA
Open source : Turning vocal imitations into sound effects. (New UX for sound generation)
R/LOCALLLAMA
Vidai Community is now available: one Rust binary for cost attribution, guardrails and multi-provider routing on every LLM call
R/LOCALLLAMA
The best AI Model for Arabic dialects 🇪🇬🦅🧡
R/LOCALLLAMA
made a local voice AI for windows you can talk to in any language. open source, bring your own key
R/LOCALLLAMA
I have 2x PC's. One with a 5090 and one with a 4080. Is there an easy way to use both together networked?
R/LOCALLLAMA
Keeping multi-GPU rigs cool?
R/LOCALLLAMA
Breaking the music supply constraint
R/LOCALLLAMA
Uploaded my Qwen3.6 27B based fine tune, after two years of experience fine tuning models
R/LOCALLLAMA