Langfuse v4 + Ollama: Tracing Local LLMs Without Mocks or Monkey-Patches
Langfuse v4 introduces a seamless way to trace local LLM interactions with Ollama by leveraging its compatibility with OpenAI's API format. By using a drop-in replacement for the OpenAI client, developers can automatically capture tracing data such as session IDs, user IDs, token counts, and full response streams without manual instrumentation. The integration simplifies observability for local LLMs while maintaining compatibility with OpenTelemetry standards.
- ▪Langfuse v4 uses a subclass of the OpenAI client to automatically trace Ollama chat completions at http://localhost:11434/v1.
- ▪Contextual metadata like session_id, user_id, and tags must be set using propagate_attributes() in Langfuse v4 due to its OpenTelemetry-based architecture.
- ▪The integration reconstructs streaming responses into a single trace, provided the entire stream iterator is consumed within the propagate_attributes context.
- ▪No custom OTLP exporters or monkey-patching are required, making the solution low-friction for local LLM observability.
- ▪Token usage is accurately captured from the response payload rather than estimated through middleware.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3812537) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Julio Molina Soler Posted on May 16 Langfuse v4 + Ollama: Tracing Local LLMs Without Mocks or Monkey-Patches #llm #ai #python #observability Disclosure: I learn topics like this through LLM dialogue. The prompts are mine, the depth comes from the model, the verification comes back to me, and I publish the result so others don't have to start from zero.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).