Running Local LLMs Offline on a Ten-Hour Flight
I flew from London to Google Cloud Next 2026 in Las Vegas. Ten hours with no in-flight wifi. I used the time to test how far a modern MacBook can carry engineering work on local LLMs alone. Setup A week old MacBook Pro M5 Max, 128GB unified memory, 40-core GPU. Gemma 4 31B and Qwen 4.6 36B via LM Studio. Top 100 most common docker images, top programming languages alongside with enough dependencies to build function sites with rich visualisations.
Full article excerpt tap to expand
Running Local LLMs Offline on a Ten-Hour Flight calendar Apr 22, 2026 clock 4 min read tag Ai Local-Llms Macbook Travel I flew from London to Google Cloud Next 2026 in Las Vegas. Ten hours with no in-flight wifi. I used the time to test how far a modern MacBook can carry engineering work on local LLMs alone. Setup A week old MacBook Pro M5 Max, 128GB unified memory, 40-core GPU. Gemma 4 31B and Qwen 4.6 36B via LM Studio. Top 100 most common docker images, top programming languages alongside with enough dependencies to build function sites with rich visualisations. Countless CLIs - with opencode, rtk, instantgrep and duckdb being most used. What I built A billing analytics tool covering two years of loveholidays cloud spend. DuckDB underneath, with a custom UI for slicing the data along dimensions the standard dashboards don’t expose. It surfaced patterns and cross-service correlations that had been hard to uncover. I was interested in exploring this area for a while, but I could never prioritise it against whirlwind of my other responsibilities. With 10 hours to spare, top of the range hardware and OSS model I decided to give it a go. Alongside that, I processed roughly 4M tokens on smaller tasks: refactors, CLI scaffolding, documentation. For tight-scope work, Gemma and Qwen produced output comparable to the frontier models I normally use. What broke Three limits showed up. Power. Roughly 1% of battery per minute under sustained load. Battery draining even when plugged in with 60w of power. Heat. At 70–80W sustained, the chassis runs hot enough to be uncomfortable. The in-flight blanket and pillow saved my knees, but made the overheating problem even worse. Context. Throughput and latency degrade noticeably past 100k tokens. Loops. A handful of prompts sent the model into an infinite loop that needed manual intervention to break. Unclear whether the fault sat at the opencode orchestration layer or the model itself. What helped: one problem per session, long plans written to markdown for re-ingestion, and minimising tool-call overhead with rtk. I avoid compaction - it is very slow to run. Instrumentation I’ve built two tools over the flight. powermonitor - a CLI that reads Mac power telemetry (CPU, GPU, ANE, adapter, battery) live. I’ve since pushed a fix for faster detection of adapter power-source changes. ⚡ Total: 81.6 W Charging (split updates when battery % changes) CPU: 4.5 W GPU: 77.2 W ANE: 0.0 W Adapter: 60 W Battery: 14% Source: AC Power (W) ▁▂▃▃▄▃▄▄▅▄▅▅▅▄▅▄▅▅▅▄▅▆▅▄▅▅▆▅▄▄▅▄▅▅ Min: 47.5 Avg: 71.5 Max: 87.3 W 14144 samples lmstats - reads LM Studio telemetry and reports token throughput, latency distributions, and context-window behaviour across a session. Both follow the same pattern we apply at loveholidays at a larger scale: instrument the system before acting on it. Community responses The LinkedIn post attracted several threads worth engaging with. Steve Turner noted that running local, where cost is physically visible, made him more critical of what he asks of cloud models. This is the mechanical sympathy principle applied to AI - direct exposure to heat, power, and context effects builds intuition about where inference is cheap and where it’s expensive. That intuition transfers back to cloud usage. Jackson Oaks made the case for Apple Silicon perf-per-watt over NVIDIA for battery-constrained workloads. The cable British Airways advertises 70W per seat. Powermonitor showed 60W delivered on the outbound flight. I decided…
This excerpt is published under fair use for community discussion. Read the full article at Dmitri Lerko.