This article and the source codes attached should be used as reference only.Please thoroughly test your implementation before making changes to production environment
Checkout our NEW Video Channel you can like and subscribe too!

Introduction

Caching is a very important topic when it comes to static assets of a web page. Images,css are commonly cached by the browser to avoid the cost of a network round trip.We can enable additional layers of content caching in number of ways.We can implement caching provided by cloud provider CDN’s like AWS,cloudflare,Akamai.They offer geo-location based HA and scalable content caching.

But here we are not going to talk about this distributed caching, rather we will see caching at more ground level. We will use an apache httpd server and proxy a backend service. We will enable caching at the httpd server and see how various headers like cache-control,last modified date,Etag etc takes control of the way the Cache is loaded and validated.Note that the end goal is see have minimum network traffic.

Our demo environment

To simulate a proxy server and a backend server.We are going to create 2 apache server running in different ec2 instances

ReverseProxy Server 13.127.108.184 
BackendService 13.126.84.157

Throughout the post, we assume

Shared cache = Cache at the apache server
Local cache = Cache at the browser

Before we implement cache, the behavior without a shared cache is somewhat like below. So in this case we only have local cache

Caching before change.png

If we add the shared cache in the apache web server which proxies the backend service,the flow will be like

apache-cache-v2.png

And for already locally cache resources

apache-cache-2-304.png

Configuration at Reverse proxy

We first load the landing page which will have link to the downstream service. Remember there is a difference between linking a page vs loading (f5) of a page in the browser directly.

The former honors the local cache while the later sends a set max-age=0 to revalidate the cache immediately.

We will discuss this behavior further, but for now lets have look at the html

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>load demo</title>
  <style>
  body {
    font-size: 12px;
    font-family: Arial;
  }
  </style>
  <script src="https://code.jquery.com/jquery-3.5.0.js"></script>
</head>
<body>
 
<b>Projects:</b>
<ol id="new-projects"><a href="http://13.127.108.184/pics/files/mona.jpg">clicked</ol>
 
<script>
</script>
 
</body>
</html>

home-page.PNG

we now need to proxy pass to the backend service.

For proxy pass the configuration is very straight forward as below

RewriteEngine  on
RewriteRule ^/pics/(.*)$  "http://13.126.84.157/img/$1" [P]
ProxyPassReverse "/pics/"  "http://13.126.84.157/img/"

So when we call http://13.127.108.184/ it calls the down stream service using url rewrite as follows

http://13.127.108.184/pics/files/mona.jpg --> http://13.126.84.157/img/files/mona.jpg

We have to configure our cache here too.Lets look at the cache configuration here


# Cache module
LoadModule cache_module modules/mod_cache.so 

<IfModule mod_cache.c>

   # Mode of caching is disk
   LoadModule cache_disk_module modules/mod_cache_disk.so 

  <IfModule mod_cache_disk.c>

    # Location to store the cache
    CacheRoot "/tmp"  

    # Cache specifically files under /pics/files/
    CacheEnable disk  "/pics/files/" 

    # No of directories
    CacheDirLevels 1 

    # Length of character in each directory
    CacheDirLength 1 

    # Don't allow the browser to take control of the cache
    CacheIgnoreCacheControl On 

  </IfModule>

</IfModule>

The CacheDirLevel decides how many directories to create from the hash string and the CacheDirLength decides how many characters are in each directory name.

For example, if you have a file that hashes to “abcdefghijklmnopqrstuvwxyz”, then a CacheDirLevel of 2 and a CacheDirLength of 4 would lead to this file being stored in:

[path_of_cache_root]/abcd/efgh/ijklmnopqrstuv

CacheIgnoreCacheControl is a very important property to check here.The shared caching will not have any effect if this is turned off (it is by default off).If this is off and a browser/user agent sends a header with cache-control max-age=0 (to validate) or cache-control no-cache(completely ignore cache and reload) then the shared cache at the server side is completely ignored and request is passed down to the downstream service.Which we certainly don’t want. We want the server the control of how the shared cache is served back to the browser.

We honor the client till proxy level and not beyond that.

The cache validity is set by

Header append Cache-Control max-age=3600

This actually sets the validity of the shared cache to 3600 secs(1 hr). This information is also send back to client, So the local cache of the client updates its validation accordingly.

Note that this actually tell the browser/UA to retain the local cache for this period of time before contacting to server at all.

we will not able to view this feature if we don’t use link.If you press enter or F5 or ctrl+F5 it will revalidate/reload explicitly without honoring the local cache validity. Thats why we had a link in our home page, instead of directly loading through browser url.

See this behavior in the images below

when we use a link,that is we click the link in the homepage to load the image, not load it directly from the browser bar

home page -> url

using-cache.PNG

when we hit enter/f5 (technically this goes as set max-age=0)

when-enter-f5.PNG

when we cntrl+f5 or disable cache option (technically this goes as set cache-control= no cache)

ctrlf5-no-cache.PNG

Configuration at Resource

Here will hold 2 images at 2 different directory so that we can see if we can narrow down the directory we want to cache.In our case only mona.jp

[root@ip-172-31-8-212 html]# tree
.
├── img
│   ├── admin
│   │   └── files
│   │       └── basketball.png
│   └── files
│       └── mona.jpg
└── index.html

we plan to cache on /img/files and NOT img/admin/files

Be aware of the local cache

The shared cache does not store 304 response from the downstream service , valid response cashable is 200. Which means that for the 0th cache load, we need a 200 response.Till that point even if you enable caching it will be skipped by the clients who already have the latest cache.

loading up the cache can be done in one of the following ways

  • Clear cache from browser and hit url
  • Open browser in disable cache mode (effectively this sets cache-control header to no-cache while sending request)
  • A new client
  • Use a curl to call the endpoint

Important thoughts

  1. For the 0th client call (one of the four scenario mentioned above) will hit the backend server and load the cache. Any subsequent call will be served from the cache.
  2. Once the shared cache is loaded, any further client request to revalidate a local cache (304) will result in revalidation of shared cache once an hour(max-age) by the proxy server with the backend resource.
  3. Any subsequent calls to revalidate local cache by any client within an hour will be directly served as 304 from the shared cache

Cache logs

It is useful to see what under the hood the cache is doing.We put log level as debug in httpd.conf

Let see it in action

Open a browser to simulate the calls

Tail error log in proxy server (to view cache logs)

tail -f /var/log/httpd/error_log

Tail access log in backend server (to view incoming request)

tail -f /var/log/httpd/access_log

Lets load the cache using curl

curl http://13.127.108.184/pics/files/mona.jpg

In the server we see the shared cache created as below

[root@ip-172-31-13-147 tmp]# ll
drwx------ 3 root root 17 May  9 09:55 systemd-private-17a7e1ac8de74e63b1a3449f23a647ad-httpd.service-Dy0HSb

also in cache log

cache: Caching url http://13.127.108.184:80/pics/files/mona.jpg? for request /pics/files/mona.jpg
AH00770: cache: Removing CACHE_REMOVE_URL filter.
AH00737: commit_entity: Headers and body for URL http://13.127.108.184:80/pics/files/mona.jpg? cached.

Now the cache in place,lets make a call from the browser for a 200 response(using cntrl+F5) and see if it returns from the shared cache ctrlf5-no-cache.PNG

AH00781: Incoming request is asking for a uncached version of /pics/files/mona.jpg, 
but we have been configured to ignore it and serve a cached response anyway
AH00763: cache: running CACHE_OUT filter
AH00764: cache: serving /pics/files/mona.jpg

Did you notice that, the webserver altered us that the browser is trying for a full reload?

But we don’t want that because the cache is fresh.This is the importance of ignoring the forcing behavior of revalidating/reloading of shared cache from browser CacheIgnoreCacheControl On.

Now we make a curl request(after 5 mins of cache expiry).We see the server1 revalidates the shared cache.Server2 says there is no change so 304.

AH00737: commit_entity: Headers and body for URL http://13.127.108.184:80/pics/files/mona.jpg? 
cached., referer: http://13.127.108.184/
AH02971: cache: serving /pics/files/mona.jpg (revalidated), referer: http://13.127.108.184/
13.127.108.184 - - [09/May/2020:10:09:48 +0000] "GET /img/files/mona.jpg HTTP/1.1" 304 - "http://13.127.108.184/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"

Now from browser if we call then it returns from the shared cache again

AH00698: cache: Key for entity /pics/files/mona.jpg?(null) is http://13.127.108.184:80/pics/files/mona.jpg?
AH00709: Recalled cached URL info header http://13.127.108.184:80/pics/files/mona.jpg?
AH00720: Recalled headers for URL http://13.127.108.184:80/pics/files/mona.jpg?

Note that we are hitting the image directly till now

http://13.127.108.184/pics/files/mona.jpg

Had we used the link (within the 5 min time period of expiry) it will still be loaded from the disk(the local cache) using-cache.PNG

Stale While Revalidate

Mod Cache

Http Cache Headers

Mozilla Caching

Cache Docs

Configure apache with caching

Location match

Mod cache in RealWorld

Stack Overflow Question

    Content