The best practices for high performance websites are well documented here, so I won’t go into web specifically, although these practices are equally applicable as a supplement for websites.
So here goes, 8 tips for making your system high performant and scalable:
Offload the database
What is the most common bottleneck in most applications? The relational database and access to it has been the bottleneck without fail on most projects I have done in the past. The solution to this?
Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them, this means that the popular Open Session in View pattern, while convenient is often a performance hog in applications severely limiting performance and scalability (but there are ways around it without sacrificing any of the convenience).
“What a difference a cache makes”
So how do you offload the database if you need to access a lot of data? For a mostly read-only application this is simple, add lots of caching. This might even work reasonably well for a read-write application if you have clear hooks and ways to expire caches when needed.
When it comes to caching, there is a clear hierarchy of efficiency: in-memory is faster than on-disk, on-disk is considerably faster than a remote network location or a relational database.
Cache as coarse-grained objects as possible
If possible, cache objects/data on the most coarse-grained level possible – even if at the fine-grained level objects are cached, a more coarse-grained approach will save CPU and time required to interogate n number of cache zones rather than a single cache zone, furthermore, retrieving a full object graph saves time assembling the object graph.
Don’t store transient state permanently
There is a tendency in many projects to put absolutely all data in a database. But is it always absolutely necessary? Do you really need to store login session information in a database? Are you storing transient state, or necessary business data?
The “state monster” is a dangerous beast. As a rule of thumb, only store actual, necessary, critical and actionable business data in permanent storage (database, disk) and nothing else.
Location, Location – put things close to where they are supposed to be delivered
A colleague of mine made a good analogy not long ago: if you know you are going to move your heavy cupboard onto a delivery truck tomorrow morning, it is better to put it in the hallway close to the front door, rather than down in the basement at the back of the house.
This is exactly why Content Delivery Networks work for websites, but it is applicable on an application and infrastructure level as well: If you need to hop through a load-balancer, web server, application server and database server, using precious resources in all tiers to retrieve data, rather than just a load-balancer and web server, your scalability and performance will suffer.
Constrain concurrent access to limited resources
Imagine you have a cache-miss despite all your caching, and have to go off and do an expensive, calculation intensive retrieval of data all the way back to a database, further imagine that 30 other clients want the exact same data before the cache has been primed. You will find yourself in a situation where potentially 30 clients go off and retrieve the same data through the same expensive operation at the same time. What a waste!
A simple way to solve this problem is to have only have the first client go off and do the expensive calculations, while having the other clients simply wait for the first clients result (and share that result). In Java, a pattern like this can be easily implemented in about 50 lines of code with the help of a CountDownLatch, ExecutorService and some custom code.
Constraining access to limited resources in this way is not only applicable to read-only access, it is equally usable for transactional write operations, which can be achieved either through having a separate “write-behind” process, or the use of asynchronous messaging such as JMS (effectively variations of the same thing).
99% of the time it is quicker to let a single thread do the work and finish, rather than flooding finite resources with 200 client threads.
Staged, asynchronous processing
This is an extention of the approach described above: separating a process through asynchronicity into discrete, separate steps separated by queues and executed by a limited number of workers/threads in each step will quite often do wonders for both scalability and performance, furthermore it minimizes the risk of a system being overloaded and crashing – an application may slow down it the time it takes to finish a task put on a task queue if the workload is heavy, but it most likely wont crash.
These are the exact reasons why “Staged Event Driven Architectures” (such as Zeus) can create http servers that can handle a concurrency amount limited by OS sockets rather than available threads (imagine 10000 concurrent clients instead of 150-200), and the reason why Erlang and Scala’s “Actor” concurrency model are so popular in certain large scale environments such as banks and telecoms systems.
Minimize network chatter
Going outside of the runtime of your application is slow, communicating with a remote application is slower than communicating with in-memory objects in the same runtime, networks are generally less dependable than RAM and definitely have higher latency. Avoid remote communication if you can, and definitely do what you can to avoid making your application too “chatty”. Of course sometimes there is immense benefit, and yes, even performance and scalability benefits in distributed state achieved by network communication – say if your application needs to constantly interrogate a registry of available services on a network, it makes sense that that registry is distributed on all available nodes.
But in general, the trade-offs of network communication need to be acknowledged and carefully weighed up, so if it is not absolutely necessary, avoid excessive network communication and traffic.
I might be missing one or two other critical performance points, and this is by no means meant to be a definite, exhaustive list, but it is nonetheless a list that contains very common performance and scalability “gotchas”. If you have more, fill in the comments!
May 8, 2009 at 3:04 am
I would be careful with the datavase off loading. I’ve seen people take the other extreme and put everything into the app layer. DB’s have SP’s for a reason. They’re awesome at optimizing the execution plan for the query. I would push architects to analyze what needs to handle where. I wish there was some set standard, but it all boils down to experience.
May 8, 2009 at 5:30 am
Really Nice efforts
It covers many important points.
May 8, 2009 at 5:46 pm
Caching on remote host (via network) might be faster than caching on local disk. Reading sequential data from disk is about three times slower than reading from remote hosts memory, not to mention seek time (I don’t remember the source of those number though).
May 9, 2009 at 1:28 am
[...] Best practices for scalable, high performance systems The best practices for high performance websites are well documented here, so I won’t go into web specifically, [...] [...]
May 9, 2009 at 9:21 pm
I agree with the author, caching wisely in a hierarchy is basically the key to scalability.
As with many things, thou shalt not overuse it of course. But sensible and targeted caching can help a lot.
That said, I also agree with Pawel that often reading from remote memory can be faster then reading from a local disk. There is some tradeoff involved here. Memory is faster then disc, but local is faster then remote. What to choose? Unless your app server is equipped with a monster IO sub-system (say, some 12 fast slc SSDs on 2 Areca 1680 RAID controllers with 4GB buffer cache each), I’d choose the remote memory.
An app server is typically equipped with very cheap disks, not capable of doing any substantial amount of IOPS. As soon as the load increases and the local disk is hit a lot, performance will deteriorate. Never mind that those disks are local, they are simply not meant for using them as a local cache. (in most app servers, disks are to store the application binaries on and maybe some config files, but nothing more).
When you put local memory (especially when in the same process) against remote memory, then it’s always a win for local memory. These days memory is dirt and dirt cheap, so there really is no shame in equipping each node of an app server with at least some 12GB of RAM.
I also would like to comment on the author’s mention of “Staged, asynchronous processing”. For our system we applied exactly that idea to a transaction processing system. The original system would let each thread handle the entire transaction, which means accepting the htpp request, doing some processing and finally persisting it to a remote DB. This went well for some time, but when the load increased only a little the system simply couldn’t cope and complete choked. And with little I’m only talking about some 4 to 12 transactions per second. After we introduced an architecture where the http requests are dropped in a (segmented) queue, and where among others there are only a few back ground threads (a fix amount, typically only 3) that continuously combine transactions into batches and pump these to the DB, we can handle in excess of 6000+ transactions per second with relatively moderate hardware.
I’m not sure why I don’t hear the term “Staged, asynchronous processing”. With a modern Java JDK and some classes from the concurrent package this pattern is quite easy to implement and gives enormous scalability benefits.
May 9, 2009 at 9:34 pm
Arjan (and Pawel):
Yes, if you have dedicated in-memory caching on a separate machine on a fast network in the same network segment, this might be faster than disk – although any heavy computation/logic compared to simple disk retrieval and it is easily lost.
When it comes to SEDA/”Staged asynchronous processing”, I too am surprised at the lack of use in particular in Java. As Arjan points out, with java.util.concurrent, the complexity of it is more or less taken out of the picture for the developer. Doing staged, asynchronous work is very easy.
I think the reason for the lack of use is probably two fold: most organizations go with “what they know”, hence the prevalence of batch jobs compared to messaging in most organizations. And second: a lot of developers (at least poor ones) struggle with anything that isn’t 100% concurrent and single-threaded from the perspective of the developer.
May 12, 2009 at 12:13 pm
[...] Best practices for scalable, high performance systems « Wille Faler’s Buzzword Bingo – So here goes, 8 tips for making your system high performant and scalable: [...]
May 13, 2009 at 1:14 am
[...] awareness of what sort of areas of a type of application tend to be bottlenecks and limitations (hint, hint), so taking them into regard at an early stage shouldn’t really be seen as any sort of [...]
May 15, 2009 at 12:07 am
[...] Applying caching at as coarse grained a level as possible in your application was one of my eight best practices for highly scalable, high performance systems, together with offloading your [...]
May 26, 2009 at 6:17 am
I always found SEDA to be fairly frequently. Its just that the terms have changed. Previously SEDA was called “pipes and filters”, ala UNIX and POSA. It was always known that each filter could be handled via a concurrency strategy and this was done when desirable. If you are looking specifically for SEDA then you won’t find it because that’s a relatively recent buzz-word (which, imho, the PhD thesis should not have been granted as it offers nothing new or insightful). If you look for pipes-and-filter based concurrent architectures, then you’ll see much it more frequently.
May 28, 2009 at 12:16 am
[...] of my eight points for highly scalable systems was to restrict concurrency for computationally heavy tasks – because quite often a system [...]
June 7, 2009 at 2:12 am
[...] Faler propõe 8 boas práticas para melhorar desempenho e escalabilidade como diminuir a carga no banco de dados, usar cache, minimizar tráfego na rede, entre [...]