GOMAXPROCS, CPU Limits, and the Kubernetes Trap That Silently Kills Go Throughput

Fri, 26 Jun 2026 00:00:00 +0000

There is a particular kind of production mystery that appears only in containerized Go services.

Your service is handling load fine on a development machine. You deploy it to Kubernetes with a sensible 500m CPU request and limit. The pod scheduled happily on a 16-core node. Under moderate traffic everything looks green in your dashboards.

Then real traffic arrives. Latency climbs. Throughput plateaus well below what the node should be able to deliver. CPU utilization inside the pod hovers around 40-60% of the limit, yet the process feels starved. You add more replicas. The problem follows the pods.

Reliability Engineering for Generative AI Platforms

Mon, 22 Jun 2026 00:00:00 +0000

The first time an AI agent I helped put into production caused a visible incident, it did not fail dramatically.

It failed quietly.

A claims adjustment agent, under load, began making decisions based on slightly stale eligibility data. The downstream payment system accepted the decisions. Finance noticed the variance three days later. By then we had processed thousands of incorrect adjustments.

There was no stack trace. No obvious error rate spike. Just a slow, silent drift in the quality of context the agent was operating on.

Reliability on Jamal Yusuf

GOMAXPROCS, CPU Limits, and the Kubernetes Trap That Silently Kills Go Throughput

Reliability Engineering for Generative AI Platforms