The platform as a service model adds new importance to the performance of our software. Despite ever decreasing hardware costs, metered services make every bit count. My previous article on performance touched on several performance related issues; this article focuses on how low performance directly leads to higher costs.
The costs of virtual hosts is still extremely high compared to the equivalent hardware. It is usually worth it once we factor in the cost of an administrator, setup, networking gear, and so forth. But it leaves us with an interesting problem: these hosts are vastly underpowered compared to commodity hardware. Our typical development desktop compares to the high end of our cloud provider’s offering.
One of the significant things I find lacking is memory. If we want to avoid the cost of high RAM hosts we need to seriously consider how to limit our memory usage. This has a significant influence on the technology and algorithms our server can actually use. For example, a Java developer may find it very hard to get their app running well on the low-end provider instances.
Anybody wanting to do modelling, image processing, or any other CPU intensive operation on the server is going to find the CPU a huge bottleneck. Providers allocate many hosts per physical CPU, so they can’t afford to give any one person a lot of computing time. If we need CPU power we will pay dearly for it. Optimizing our algorithms, and limiting the workload, become mandatory.
Perhaps one of the big potential cost pits is a database. Providers have a range of ways to price these, both limiting how much space we can use and how fast our queries will run.
If we choose a cost per disk transaction then indexing becomes important. Inefficient queries, or full table scans, start to become very costly. Even though our query returns a small amount of data, it may have needed to load a large amount of data from the disk to find it. If we haven’t overloaded the DB we may not even notice any speed issue, but certainly we’ll notice the cost at the end of the month.
If we chose a cost per row, or MB, model then what we store becomes important. Using the DB as a log backing store no longer seems like a good idea when every row ends up costing us. Large binary blobs may also be an issue, forcing us to use a specialized file store service instead of the DB. And obviously removing any redundancy from the DB tables and columns will yield direct cost savings.
What becomes obvious is the decision to use cloud hosting should have a significant impact on the way we design our application. We can’t just design locally and then expect to deploy to a hosting provider. Sure, that is possible, but the resulting cost will be exorbitantly high! To keep costs down we really need to design around and target the PaaS provider.
From a business perspective these costs provide a direct way to justify optimizations. We can simply measure the number of rows scanned in a DB, the amount of CPU time taken, or the amount of disk space used by an application, or method. From there we can calculate the cost. Now when we present a potential optimization to management we can genuinely show how it will save the company money.
This still needs to be weighed against the added development time, but at least we have a basis to make this comparison. If our goal were simply to improve response time, it becomes a lot harder to justify the time. The relationship between response time and revenue is a lot more tenuous than a concrete link between processing and cloud costs.
In a way I find it kind of refreshing that performance returns as a significant consideration. I think having abundance of cheap powerful hardware has led to a lot of inefficient technologies. The constraints of cloud computing should have more projects thinking more carefully about their implementation. Well, at least until the cloud costs come down enough that we can stop worrying again.