Its a well known fact that reading from RAM is much more efficient than reading from disk.
Reading from several other sources like database, file-system or from a networking system is also much slower than RAM.
In a distributed computing environment, the central point of contact for all the servers is usually a database.
To handle the load on a website, it is required to add more servers so that more requests per minute can be handled by the system.
It is easy to keep on adding the number of servers as the traffic on a site increases but the increasing number of servers puts
a heavy load on the database. The load can cause the site to perform slowly even if static resources like images, video etc are
read from the disk each time a request comes. To remedy this, the server can maintain a hash-map like structure in the RAM which
will put the request as a key and the requested object as a value in the hash-map when the object is first requested.
After that, all requests for the same object can just refer this hash-map and get served from the RAM instead of making a
call to the file-system or the database. This strategy thus speeds up the time a server takes to respond to requests.
Maintaining such a hash-map in the RAM is called caching.
Designing a cache
Caching objects helps speed up the servers but the following factors need to be kept in mind when designing a cache:
Freshness of cache: Cached objects represent some underlying resource which may be
vulnerable to changes from other clients.
For example, a user's image on a networking site may be cached by the servers such
that visitors of that profile need not fetch that image from the file-system again and again. However, if the user changes
his profile picture, than the cached entry becomes stale and needs to be refreshed by reloading from the file-system.
While designing a cache, its thus important to consider if the cached objects could become stale and a way to refresh
such objects when new data arrives.
Size of cache: Cache size is typically much smaller than the underlying storage. Thus, its important to
discard cached objects which are not expected to be used frequently. The cache size can be kept small by various strategies:
a) Discarding objects which have not been accessed in a long time. This kind of cache is called Least Recently Used (LRU) cache.
b) Having dynamic metrics of an object's access like how frequently an object is accessed. This is similar to first option with the
difference that LRU approach only considers the time of last access while dynamic approach takes into usage of an object over time.
So, a very frequently accessed object may fall out of cache in the LRU approach if its not requested for some time but in the dynamic
approach, that resource will remain in cache until its frequency of access drops below a threshold.
Distributed cache: A single computer's cache may prove to be insufficient for a heavy-traffic website.
In such a case, a distributed cache can be designed which will combine the cache of several machines to provide a larger cache.
This is explained in more detail in Chapter 2 - Caching software
What objects to cache: Since the size of cache is limited, it makes sense to not keep extremely heavy objects in
cache. For example, a 1 GB cache is meant to hold images, then it does not make sense to hold a 100 MB video in the same cache.
Similarly, if an object's underlying resource is liable to change very frequently, then also it should not be cached.
Latency: Although cache is designed to reduce latency, it makes sense to grade objects based on their latency
and include normal-latency-of-access as one of the factors in deciding what objects to cache. Thus, if an object is to be read
from a local disk while other one is to be read from a secondary storage like DVD drive or obtained through a network, then
the higher latency objects among these should be given some preference in the cache.
Measuring cache performance
Cache performance can be measured by getting estimates of the following factors:
Hit Ratio: The hit-ratio describes the average number of times the requested resource is found in the cache.
Latency: Latency describes the response time of a cache i.e. the time taken by the cache to return a requested object.
Refresh time: The time it takes for a cache to refresh an object which is changed in the underlying resource.
Got a thought to share or found a bug in the code? We'd love to hear from you: