A disappearing Service Processor (2025)
The article discusses challenges faced in debugging the Service Processor (SP) of the Oxide rack. Issues arose when the SP dropped off the network, complicating the debugging process. The team explored various theories and made adjustments to improve system monitoring and debugging capabilities.
- ▪The Oxide rack is designed for exclusive network access, minimizing the need for physical visits by engineers.
- ▪Debugging the Service Processor became challenging when it dropped off the network, limiting insight into its state.
- ▪The custom operating system, Hubris, is prone to task starvation and stack overflows, complicating the debugging process.
Opening excerpt (first ~120 words) tap to expand
11 Dec 2025A disappearing Service ProcessorLALaura AbbottEngineerOne of the considerations in designing our Oxide rack is asking which parts we expect to be accessible and by what means. The Oxide rack is designed to live in a data center with exclusive access via the network. The only reason an engineer should ever need to physically visit a rack is to replace a failing part, such as a disk. Our Service Processor (SP) is accessible via the management network.During some of our first attempts at putting our next generation Cosmo sled into an Oxide rack, we would see the Service Processor drop off the network. This is a tricky situation to debug, as without network access we have limited insight into the state of the SP itself.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Oxide Computer Company.