Skip to main content

Why the AI playbook breaks when you scale past a few developers

·811 words·4 mins
Tech AI Scaling

I watched a client roll out AI tools to a team of 10 engineers and it went fine. Then they tried doing it across a thousand people and the wheels fell off. They are not even playing the same sport, though I still see executives trying to run the exact same playbook for both.

Over the last year, I have looked at this transition from almost every angle. From my own hacking away with Claude Code and Codex, to 5-person startups trying to ship code faster, to a small size companies trying to centralize what they knew.

The shift is obvious once you watch it play out: as an organization grows, the bottleneck moves from a personal tooling problem to a systems and governance problem.

The individual developer
#

Illustration for The individual developer At the personal level, AI is just a handy force multiplier. It is great for refactoring some Python code, drafting a brief, or summarizing a long industry report.

I remember when I first set up a local Node.js script to query the OpenAI API. I was so excited about how fast it spit out answers that I did something incredibly stupid: I hardcoded my raw API key directly into index.js to save thirty seconds, forgot about it, and committed it to a public GitHub repository. My heart did a quick flip in my chest when the automated security email arrived. I panicked, had a brief moment of cold sweat, then realized the fix was simple. I revoked the key in the OpenAI dashboard, generated a new one, added .env to my .gitignore, and went back to work.

At this scale, the blast radius is tiny. If you break something, you have only wasted your own afternoon.

The team of 10
#

Illustration for The team of 10 Once you grow to a 10-person team, the challenge becomes collaborative. You have to handle shared API keys, coordinate prompts, and keep everyone from drifting in different directions.

When one small team I advised first tried working with LLMs, they shared a single developer account to save on administrative overhead and avoid setting up Okta SSO. It was a complete pain in the ass. Within a week, they hit rate limits right in the middle of a critical deployment because someone decided to run a massive batch-processing script on the same key. They had no way of tracking who was running what.

They had to set up basic guardrails: shared team workspaces, simple templates for prompts, and a hard rule on what kind of customer data could be fed into public endpoints. The tool still matters most here, but teams have to start thinking about basic coordination.

The hundred-person organization
#

Illustration for The hundred-person organization At a hundred people, the friction shifts entirely to data architecture. This is where I often see teams build their first Retrieval-Augmented Generation (RAG) system, only to hit a massive wall.

One client thought they could just dump their entire Notion workspace and Google Drive folders into Pinecone, hook up an LLM via LangChain, and call it a day. Instead, they spent 3 weeks debugging why the model kept pulling up outdated billing specifications from 2021 instead of their active Q3 migration documents. The model was not broken; their internal documentation was a mess, and the AI was just reflecting that mess back to them with high confidence.

You cannot buy your way out of a bad data structure with a better model. At this scale, companies need standardized data pipelines, clean retrieval strategies, and someone who actually owns the documentation lifecycle. If the internal wiki is garbage, the custom RAG will be garbage too.

The thousand-person enterprise
#

Illustration for The thousand-person enterprise When you hit a thousand people, the problem isn’t the model or even the data pipelines; it is legal, compliance, and human risk.

At this scale, you are no longer trying to make developers 20% faster; you are trying to stop the company from getting sued. If a developer pastes proprietary IP into an external model, or if an LLM hallucinates a security policy that a junior engineer follows, the cost is massive.

This is where the standard developer playbook breaks completely. You cannot just tell people to experiment fast when you have strict SOC2 compliance and financial audit trails to maintain. You need single sign-on, role-based access control, automated audit logs, and clear legal indemnification from your model providers.

Solving for the actual bottleneck
#

The specific model under the hood matters far less than your data pipelines, your access controls, and your internal culture. I see companies run fast with early pilot programs only to stall at larger scales because they treat a structural system failure as a simple software purchasing decision.

Before writing a check to an enterprise AI vendor, you have to diagnose your actual scale and solve for the real bottleneck. Otherwise, you are just buying an expensive engine to sit idling in a traffic jam.

Related

WooCommerce slows down under concurrency, not under load
·1092 words·6 mins
Tech WooCommerce WordPress Performance Scaling
The WooCommerce performance failures that actually hurt at scale don’t show up in standard audits. They live in plugins doing unbounded per-request work that looks harmless at five requests per second and falls apart at twenty.
When bumping PHP memory isn't enough: tracing serialized cache bloat in a page builder
·750 words·4 mins
Tech WordPress Performance PHP Caching
A WordPress 500 traced to PHP memory looked like a simple limit bump — until the real cause turned out to be a page builder serializing large CSS blobs into the object cache on every request.
That 4 MB `options:notoptions` key is why your WordPress site throws a 500 every ten days
·788 words·4 mins
Tech WordPress Redis Performance Caching
An intermittent WordPress 500 that cleared on refresh turned out to be a single 4 MB Redis key growing without a TTL. Here is what the big-keys scan showed, why the mechanism is easy to miss, and the three config changes that stopped it.