Performance is irrelevant

Asking questions about performance online universally invites scorn and accusation. A large number of programmers apparently feel that the efficiency of code is nowadays insignificant. So long as the functional requirements have been met the approach is golden. Any attempt to discuss improvements is met with stiff resistance. The most common of the mantras are “premature optimization” and “have you measured it”.

While at times such notions may not be misguided, this general attitude towards ignoring performance considerations is quite dangerous. Let’s look at a few reasons why performance is still quite relevant, especially early in the development process.

Performance Issues

Some things don’t parallelize

There are a great number of functional behaviours which simply can’t be done in parallel, or rather, gain nothing when done in parallel. At the high-level, most client-server requests tend to have one outermost request which works in serial to assemble the response for the client. Perhaps many of the sub-requests can be done in parallel, but this serial code still has to execute and can often become the bottleneck in the total response time.

At lower levels there are algorithms which can’t be efficiently handled on multiple cores. Often the overhead of splitting up the work is more than the cost of the algorithm itself. Or sometimes, just like the client requests, the algorithm has a serial nature that just can’t be avoided.

Chips aren’t getting any faster

In recent years the speed of individual cores has not really been increasing. While we are certainly not at the theoretical limits yet, the physical obstacles to increasing speed are significant. Essentially we’ve hit a speed limit in the commodity market and chips simply range from 2 to 4 Ghz. Industry has decided providing more cores is better than providing faster cores.

Combine this with the inability to parallelize certain behaviours and you can see a problem.

Can’t fix it later

Designing to be scalable, either to multiple cores, or to multiple computers, is something that has to be planned for fairly early in the process. While you don’t need to scale immediately, you at least need to chose an architecture and algorithms that can be easily adjusted later. Most people, including programmers, tend to think in a serial fashion, thus most code tends not to lend itself to concurrent processing. If you have failed to consider concurrency, and scaling, early in the process, you may find that path fairly difficult.

Even small decisions made early on can lead to significant performance loss when used systemically through the code. Perhaps a band internal flow, or poor use of global memory. Once a bad behaviour is ingrained at all points in the code it becomes extremely time consuming to change. Naturally programmers just follow the existing code; whatever is there at the start will be magnified throughout.

Solutions are non-trivial

This is a good counter to those who blindly argue you should profile your code to see what is more efficient. If any algorithm could be coded in multiple forms within a few hours then perhaps simply trying them out is a good idea. Most algorithms are however part of a larger system and trying to segregate and replace that component can often be difficult. Often the item to be improved can simply not be isolated and is more of systemic feature in the code. Given the amount of time that will be required to make a change, or code the first version, it seems entirely reasonable to try and think about the efficiency ahead of time.

This also entirely neglects that doing proper performance measurements is very hard. But that is a topic all on its own.

Broken theory of scaling

A system which can scale in theory is vastly different than one that can actually scale. A common failure is related to networking. A cluster of computers has to be connected by real switches and wires which have a fixed limit on the traffic they can effectively handle. Beyond that you need to start grouping and segregating. Designs often fail to account for this, requiring direct connectivity between all computers and/or forcing all traffic through a single machine.

Inefficiency costs money

Even if your system can scale simply by adding more computers, this isn’t necessary the best solution. Every additional machine has a real cost associated with it. Beyond an initial purchase and installation cost, there is a continual ongoing cost in maintenance and electricity, as well as the final disposal cost. These expenses are significant, and in a highly competitive service market, reductions of even 5 or 10% can make huge difference in the ability of the company to attract clients.


While spending too much time fine-tuning the details is often fruitless, backing too far away from performance concerns can be downright dangerous. A lot of major performance issues can be handled up-front with just a tiny bit of planning and forethought. Even if your non-functional requirements are light at the start, don’t be surprised by changes in demand, and in particular peak load problems. Never forget that hardware has physical limitations to scaling, and that every additional machine is an added cost.

This is not attempting to be an argument in favour of excessive optimization. It is more of a counter to the alarming trend I see to completely ignore performance issues. Don’t ignore that 5% loss of speed in your module, it’s going to magnify with the 5% loss in all the other code!

14 replies »

  1. One of the main concerns thrown around when deciding to spend the extra 5% time on non functional and performance related fixes is the business value it delivers. Websites and banks try to build faster or scalable systems because speed is important in their business – for many other businesses, the tendency to weigh the benefits and ROI of a quick deployment against the cost of a scalable design always ensures that these things are swept under the carpet.

  2. I don’t actually believe that companies which sweep performance concerns under the carpet have done any serious ROI. While I’m sure many people claim ROI as a reason to avoid performance, I just don’t believe most of them have actually done any serious research. It’s just not believable that a development firm which well understands the role of performance in ROI could chose to ignore it.

    Additionally, many performance improvements in code are nearly trivial. Quite often a simple team discussion on certain methods is enough to cover several small performance items. Not taking the time to consider these can’t even be related to ROI but really is just a matter of poor coding.

  3. Serious product firms don’t generally ignore these concerns and do not generate ROI on applications that do not perform well. But enterprise software, especially ones that are built for internal use, in many cases that I have personally seen, performance optimization comes to the table quite late and is often ignored because they just don’t see the application being used for so many people so many years down the line – I will also admit there are some seriously technical managers and management who understand and approve the importance of making performance decisions early enough.

    • Scope is always relevant, of course. Simple apps are always subject to an entire rewrite condition. Anything which can be coded in a week or so can likely be migrated and completely recoded at a later time. This would probably qualify as sufficient performance planning for such a project.

      Yet I take exception to the domain of “enterprise software” as a whole. A poorly performaning internal application can seriously demotivate and enrage employees. They can also be serious failures in company process as people avoid using them. So while the non-functional requirements are not likely as severe on internal apps, the implications of performance are no less significant.

  4. In general I think I agree with you, but your post amounts to “people should give some undefined amount of thought to performance but more than they usually do, at least in some companies/organizations.”

    I recommend the following:
    – before beginning to code, do some investigation and back of envelope calculation of the expected performance requirements.
    – use above to determine which programming environment to use and to decide if there are any aspects of the programming itself which have highly critical performance needs. try to draw a box around such requirements and deal with them separately from the rest of the system.
    – hire people who are competent. this is the most significant and overlooked thing in the programming world. if your people are not competent, they will not do any better with up front performance analysis than they would by doing it iteratively – in fact the result would likely be much worse in the “up front” case.
    – measure. write tests that make sure that defined performance requirements are met on a test system. run these tests on a pre-defined schedule. if any tests fail, analyze what is going on and fix it intelligently before continuing to code new features.
    – what i’ve written applies to any important aspect of a system, not just performance.

  5. Spot on.

    I do performance QA for network security products and more often than not we are the smallest and least regarded team, always left to the last minute. This mentality has bitten many development teams on the ass.

  6. “In recent years the speed of individual cores has not really been increasing.”

    “chips simply range from 2 to 4 Ghz. Industry has decided providing more cores is better than providing faster cores.”

    No offence, but you’re a fucking idiot if you think this is the case. Try benchmarking a single core of a Core i7 against a single core of a Core 2 Duo. What’s that, they’re not the same speed?

    Now try clocking them the same, and repeat your experiment.

    What a load of bunk.

    • I do performance tests regularly on a few different chips. The performance of a chip, GHz for GHz has certainly improved, but there is a limit to this improvement. The optimizations employed by the chips, to make them faster, can only help so much. Some algorithms are bound directly to the Ghz and will not gain from many of the improvements.

  7. I use to believe that performance didn’t matter, because you could always go back and optimize / refactor to meet performance needs. My tune has changed in the last couple of years though. There becomes a point when you are throwing TOO MUCH hardware at the problem, especially when significant optimization’s can made in an extra day or two of programming

  8. Thanks for the well written article. The content is interesting but I would like to compliment you on your writing.

  9. You have totally (and I mean TOTALLY) mis-titled your article, possibly for reasons to get more clicks. You’re writing about performance being relevant, not irrelevant. And from that angle, I found nothing new here.

    • The title reflects an attitude I am finding in several online forums. I am attempting to counter that argument. Sorry if some have found the title misleading.

  10. Great post. You put my thoughts on the subject into words in a very clear fashion. I regularly find myself in arguments against the “thinking about all aspects of the code (especially performance and security) before writing it is a waste of time” crowd. Next time, I’ll link them to your blog.

    As for the title, I’ve read a few more of your blog posts, and a lot of the titles appear to have a bit of a “flamebait” nature, saying exactly the opposite of the actual article. Don’t you think it’s time to retire that cheesy style figure?

    • Unfortunately it seems that controversial titles are about the only way to get linked from many social news sites. :(

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s