Reliability

GOMAXPROCS, CPU Limits, and the Kubernetes Trap That Silently Kills Go Throughput

If your Go pod has less than one full core, the runtime treats it as single-threaded for parallelism. For years the Go scheduler didn't know about Kubernetes CPU limits and over-scheduled anyway. Here's what changed in Go 1.25, why giving a pod 'extra' CPU sometimes dramatically improves throughput, and how to get it right in 2026.

Reliability Engineering for Generative AI Platforms

Fifteen years of distributed systems, real-time pipelines, and incident command applied to LLM platforms. How to build agentic systems that degrade gracefully, contain failures, and remain auditable when everything is on fire at 2 a.m.