Monday, June 15, 2009

The Role of Caching in Large Scale Architecture

Thanks to the original author and source

Pre-Internet, lots of systems were built without caches. The need to scale has led to the widespread deployment of caching. Most of the open source caching projects grew out of internal efforts to solve performance problems for specific web sites. Ehcache, the most widely deployed Java cache, was originally developed at Wotif.com. And Memcached was developed for LiveJournal.com.

In this article we look at how to go about caching, and the different types of caches. We look at the different types of caching problems and recipes for each. We will also see how to use Ehcache and Ehcache Server to solve these problems.

Why does Caching Work?

Requests for data are not randomly distributed.

If requests for data were entirely random it would be hard to cache a subset of it. Caching works in computer systems because of the phenomenon of Locality Of Reference. More generally it works because many natural systems follows a power law probability distribution, often called a Pareto Distribution, which is shown in figure 1. This is sometimes called the long tail. But the flip side of the long tail is the fat head. The fat head is what we cache.

Figure 1: Pareto Distribution

If you are in doubt take a look at your own systems and create a chart of the frequency of data of a given type.

These observations allow us to create hierarchical approaches where we try to match frequency of use to the speed of access of the cache and the capacity of the cache.

It is useful to use the example of computer hardware. Computer memory is structured in a hierarchy consisting of registers, L1, L2 and L3 caches and then main memory. With each step down the hierarchy speed is multiple times slower but the capacity goes up.

Data is often written once and read many times

This is know as the read-write ratio. Almost all systems have data with read-write ratios > 1. Lots of systems have data with read-write ratios > 20. These are good candidates for caching. When a read can be satisfied by a cache, it is called a cache hit ratio.

The cache hit ratio is improved by holding more data in the cache, and by holding on to the data for longer periods.

Stale Data is often acceptable

If the data is changed in the system of record, but that change is not reflected in the cache, then the data is stale. Sometimes this can be avoided through a cache invalidation protocol. But in many cases it is perfectly acceptable for data to be stale for a time – the time to live.

Take Google search as an example. Crawlers index periodically. commonly once per day. As a result the entire cache is always stale. Yet no one complains because the staleness does not affect the usage.

Reasons to Cache

The first reason to cache is performance. The speed increase for a given fetch is orders of magnitude faster.

The second reason to cache is to reduce costs by reducing resource consumption. Some resource usage which can be reduced are CPU, database, external API usage, S3 fees, and datastore fees. Lastly, caching helps systems scale.

What and where to Cache? Amdahl's Law

Most people think they know where the bottlenecks are in their systems. Many times they turn out to be wrong. Without realising it, everyone has natural bias to solve problems using the toolset they already have - if you all have is a hammer then every problem looks like a nail.

I thoroughly recommend taking a formal empirical approach to break out of the bias. Take timing observations across the entire system.

Then use Amdahl's Law to tell you what part of the system to start optimising performance in. The benefit of a performance optimisation has an upper bound being the time taken by the part of the system being optimised. Take an example: a web page takes 21 seconds to render on a browser. The server time was 5 seconds, load across the Internet was 13 seconds and browser rendering, was 3 seconds. Applying Amdhahl's law we should fix the Internet load time, perhaps with an edge cache, then the server side and finally page rendering.

The process is iterative. Eventually you always get to the server side. Repeat the process. What typically comes out as expensive is:

  • creating the page/ajax response
  • creating page fragments
  • external API calls
  • assembling collections of data/processing of data
  • database calls

Apply Amdahl's law again. If you are using an Object-Relational framework, the best candidate to start with is caching database calls.

Once you identify an area where caching may be applied start data analysis. Analyse your production data to understand request distributions, data sizes and read-write. Determine the acceptable staleness or identify a cache invalidation protocol. With just the analysis you can determine the effectiveness of adding a cache.

Cache Hierarchies

Caches are available with varying latencies and capacities. We need to match out requirements to the cache.

Local Caches

In-process Caches

Here the cache is in the JVM. Access is call-by-reference so is really fast, measured in microseconds. That is an awful lot of speed.

O-R frameworks in particular need this speed. Hibernate may do hundreds or thousands of calls to the database for a single page. It is designed to work with a high performance in -memory cache.

Some examples of local caches are Ehcache core, OSCache, Oracle Coherence Near Cache. Also for web users, edge caches, such as Akamai qualify. Ehcache is available for Hibernate and OpenJPA. It has a general purpose caching API for in-process. It also has a Servlet caching filter for pages/ajax responses and page fragments. Finally there is a Jruby gem for JRuby in-process caching.

These types of caches are limited to the maximum memory size of your process. 32 bit architectures were limited to 32GB. Thankfully, with 64 bit architectures, we have moved past that limitation. In Java's case, the heap size is practically limited by garbage collector performance. As garbage collection has improved so have these cache sizes. The largest Ehcache local caches top out around 20GB. Based on production experience, 6GB heap sizes should work well, with the right garbage collector settings.

In-process caches offer a unique benefit over all other types – data is represented exactly as it needs to be for the programming language. There is no unmarshalling required.

Disk Caches

To move beyond the constaints of memory, some local caches offers storage to disk. The disk store is local to a machine but can theoretically be as large as the locally mounted file systems.

Ehcache offers a disk store set up in a natural hierarchy to the memory store. Once the memory store fills up, data overflows to disk. Ehcache also has disk persistence so that the entire cache can be persisted to disk on demand or at shutdown. It is there when you next restart the JVM saving the cost of repopulating the cache.

Clusters of Local Caches

There is a problem with local caches when you have multiple machines running an application. A request could hit one machine which then caches the data, but then the next request could hit another machine which does not have the data. If nothing is done the cache hit ratio equals the local cache hit ratio / machines in the cluster.

Cache replication solves this problem. Cache operations are replicated to other nodes in the cluster. Ehcache comes with peer to peer replication using either RMI or JGroups. It also provides centralised replication via JMS. Ehcache RMI based replication architecture is illustrated in figure 2.



Figure 2: Ehcache RMI based replication

In Ehcache, turning on replication is easy. For example, to add RMIReplication with defaults to a cache declaration, add the following:

1.<cache name="sampleDistributedCache"
2. ...
3. <cacheEventListenerFactory
4. class="net.sf.ehcache.distribution.RMICacheReplicatorFactory"/>
5. </cache>

Client-Server Caches

Here the cache is somewhere over the network running on a cache server. Some examples are Ehcache Server, memcached, Apache, Squid and in-memory databases such as TimesTen and MySQL (using in-memory tables).

Access times are measured in milliseconds, compared with microseconds for local caches.

As an example here is the time reported by cloudstatus.com for Google's AppEngine memcached.


Cache System

Response Time

Get Time (1 MB)

Google AppEngine memcached 6.5ms 62ms

These speeds are not suited to usages requiring rapid lookup like O/R frameworks. In the computer hardware analogy, why would we need L1-L3 caches if memory was fast enough?

But client-server caches have a key benefit – they effectively have infinite capacity. While the resources of a single server are limited, caches can be partitioned or sharded across any number of cache servers. Partitioning addressing works by using consistent hashing, where the same key always resolves to the same partition or shard.

As a sidenote, it is worth noting that there are large scale systems out there that apply partitioning at a higher level. In these approaches a whole system is broken into workgroups. Each workgroup has an instance of the system with it's users' data. A fronting load balancers directs user requests to their workgroup.

Ehcache Server (http://ehcache.sourceforge.net/documentation/cache_server.html) is a REST based client-server cache. It has a simple API, so simple you do not need a client – interacting with it is just a few lines of code.

For example here is a Ruby client which reads data out of the cache:

1.require 'rubygems'
2.require 'open-uri'
3.require 'rexml/document'
4.
5.response = open('http://localhost:8080/ehcache/rest/sampleCache2/2')
6.xml = response.read
7.puts xml

Ehcache Server is typically deployed behind a load balancer for redundancy. Because the keys are encoded in the path of the URL, consistent hashing can either be applied in the client or on the load balancer via URI routing. It offers redundant configuration for each partition via master-master replication using one of Ehcache's replication methods.

Scaling is simple. If load increases, simply add more servers to each partition. If data increases, simply create more partitions. Ehcache Server can scale to any size architecture. An example topology with load balancer based consistent hashing and partition redundancy is illustrated in figure 3.



Figure 3: Ehcache Topology

Memcached uses its own custom protocol, not HTTP. It relies on client-side consistent hashing. Redundancy is not provided. This is typically worked around by creating two sets of memcached servers. The client writes twice.
The Memcached topology is illustrated in figure 4.



Figure 4: Memcached Topology

Recipes for Caching - Bringing it all together

The following table shows recipes for various caching scenarios.

Data Characterestics

Caching model

Data Size <>Use a local cache with no replication.
Data Size <> Use a local cache with replication.
Large Data Size, Inexpensive Fetching Use a client-server cache with no redundancy. If you have a cache failure you regenerate the cache.
Large Data Size, Expensive Fetching
Use a client-server cache with redundancy.

Finally, if you have a large data size and need fast cache performance, use a hybrid approach with a local cache backed by a client-server cache. The client-server cache becomes a backing cache. This is very easily done in Ehcache with CacheEventListeners to put changes into the backing cache and CacheLoaders to automatically load from the backing cache.

The different types of caching, and how they relate are well illustrated by the Ehcache Architecture diagram in figure 5.


No comments:

Post a Comment