The Difference Between 'It Works' and 'It Scales'

Jeff Straney·April 14, 2026

I built a feature once that was perfect. It worked. It handled the use case exactly as intended. I shipped it. Three months later, we had ten times as many customers. The feature was now a bottleneck. I had to rebuild it. That time, I knew what was going to break because I had lived through it.

The problem is not that I made a bad choice initially. The problem is that I made a choice designed for the scale we were at, and then the scale changed. The solution that works at 100 users doesn't always work at 1000 users. The solution that works at 1000 users doesn't always work at 10,000.

Most of the time, you can't know exactly what scale you will need. But you can know the constraints of your solution and whether you have thought about how they change when the load multiplies.

The elephant is the mascot for PostgreSQL — Something that works at 100 users might break at 1000. Know your scaling constraints before you need them

What Actually Changes

The thing that surprises people is that the code does not necessarily change. The same algorithm works at any scale. But the resources it needs change. Memory changes. Network calls change. Database connections change. Disk I/O changes.

A cache that is perfect at 100 requests per second becomes a liability at 1000 requests per second if you did not think about cache invalidation. A synchronous call that is fine for a small customer base becomes a bottleneck when a large customer makes the same request concurrently. A denormalized table that speeds up reads becomes a nightmare when you have to keep it in sync at high write volume.

These are not mysteries. They are consequences of specific choices. And they are predictable if you think about them before you make the choice.

Before You Commit to an Approach

Before you commit to an approach, it is worth asking a small number of specific questions. What resources does this use? Name them: CPU, memory, disk I/O, network calls, database connections. Then ask: if load 10x's, which of those resources 10x's with it? Which stays flat? Which grows faster? That tells you what breaks first. And if you can estimate when you hit the first bottleneck, you know whether that matters at your current scale or whether it is a problem for a future version of the team.

You do not have to solve for infinite scale. You just have to know what your assumptions are and when they stop holding.

Practical Examples

Example: you build a feature that sends an email when something happens. It works. You test it. You ship it. Now you have 100x more customers. The email service you are using has a rate limit. You need to queue emails and batch them. If you had thought about this upfront, you would have built the queue from day one (or deferred the feature until it was necessary).

Example: you build a dashboard that queries a table to show live data. It works great. The page loads fast. Now you have 10x the data. The query that took 500ms now takes 5 seconds. You need caching or denormalization or sharding. If you had thought about this upfront, you would have denormalized the data from day one or built caching into the architecture.

Example: you build a search feature that does a full-text search on a column. It works for 10,000 records. At 1 million records, it is slow. You need a proper search index, or you need to move to a specialized tool. If you had thought about this upfront, you would have chosen a solution that scales with the data size instead of one that works until the data size matters.

In all three cases, the initial solution was not bad. It was just not designed for a scale the author did not have at the time. The key is knowing when your current solution stops being viable so that you are not scrambling to fix it when the scale changes suddenly.

Not Premature Optimization

This is not a call to optimize everything upfront. It is a call to think about constraints and scaling before you commit to an approach that will be expensive to change.

The difference: premature optimization is "this could maybe be slow so I will make it weird and complex now." Thinking about scale is "this will definitely get slow when X, so I will either design around that or I will plan to rebuild when X happens."

One makes your code weird and complicated for a problem that might not exist. The other makes you intentional about your choices and lets you adapt when you have more information.

I have shipped solutions at 100 users that would not scale to 10,000. I knew they would not scale. I planned to rebuild them. When the rebuild was necessary, I had the information I needed because I had thought through what would break. It cost time and it was worth it, because the alternative was either building for a scale I did not need or guessing wrong and scrambling when the scale surprised me.

Know what works at the scale you are at. Know what breaks when you 10x. Then decide whether you are building for the current scale or the next one. Either choice is fine. The wrong choice is not thinking about it and being surprised when it breaks.

If something here saved you time or changed how you think about a problem, you can support the work.

Support the Work

No PayPal account needed.

On this page

What Actually Changes

Before You Commit to an Approach

Practical Examples

Not Premature Optimization