Category: Varnish Part 2

Varnish 3.0 Setup in HA for Drupal 7 with Redhat (Part 2)

Varnish 3.0 Setup in HA for Drupal 7 using Redhat Servers

So now that you have read the Varnish and how it works posting of my blog. We can begin with how I went about setting my varnish. The Diagram above is basically the same setup we had.

Since we were using redhat and this was going into production eventually. I decided it was best to stick to repos, now keep in mind you don’t have to do this. You can go ahead and compile your own version if you wish. For the purpose of my tutorial, we’re going to use third party repo called EPEL.

  1. Installing Varnish 3.0 on Redhat
  • This configuration is based on lullobot’s setup, with some tweaks and stuff I found that he forgot to mention which I spent hours learning.

Varnish is distributed in the EPEL (Extra Packages for Enterprise Linux) package repositories. However, while EPEL allows new versions to be distributed, it does not allow for backwards-incompatible changes. Therefore, new major versions will not hit EPEL and it is therefore not necessarily up to date. If you require a newer major version than what is available in EPEL, you should use the repository provided by varnish-cache.org.

To use the varnish-cache.org repository, run

rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el5/noarch/varnish-release-3.0-1.noarch.rpm

and then run

yum install varnish

The --no-signature is only needed on initial installation, since the Varnish GPG key is not yet in the yum keyring

Note: So after you install it, you will notice that the daemon will not start. Total piss off right? This is because you need to configure a few things based on the resources you have available. This is explained all my Varnish how it works post, which you have already read 😛

2.    So, we need to get varnish running first before we can play with it this is done /etc/sysconfig/varnish. These are the settings I used for my configuration. My VM’s had 2 CPU’s and 4 gigs of ram each.

If you want to know what these options do, go read my previous post. It would take too long to explain each flag in this post, and this will get boring hence why I wrote it in two parts. Save the file and then start varnish /etc/init.d/varnish start. If it doesn’t start you have a mistake somewhere in here.

DAEMON_OPTS=”-a *:80,*:443 \
-T 127.0.0.1:6082 \
-f /etc/varnish/default.vcl \
-u varnish -g varnish \
-S /etc/varnish/secret \
-p thread_pool_add_delay=<Number of CPU cores> \\
-p thread_pools=Number of CPU cores\
-p thread_pool_max=1500 \
-p listen_depth=2048 \
# -p lru_interval=1500 \
-h classic,169313 \
-p obj_workspace=4096 \
-p connect_timeout=600 \
-p sess_workspace=50000 \
-p max_restarts=6 \
-s malloc,2G”

3.  now that varnish is started you need to setup the VCL which it will read. The better you understand how your application works, the better you will be able to fine time the way the cache works. There is no one way to do this. This is simply how I went about it.

VCL Configuration

The VCL file is the main location for configuring Varnish and it’s where we’ll be doing the majority of our changes. It’s important to note that Varnish includes a large set of defaults that are always automatically appended to the rules that you have specified. Unless you force a particular command like “pipe”, “pass”, or “lookup”, the defaults will be run. Varnish includes an entirely commented-out default.vcl file that is for reference.

So this configuration will be connecting to two webserver backends. Each webserver has a health probe which the VCL is checking. If the probe fails it removes the webserver from the caching round robin. The cache is also updating every 30 seconds as long as one of the webservers is up and running. If both webservers go down it will server objects from the cache up to 12 hours, this is varied depending on how you configure it.

http://www.nicktailor.com/files/default.vcl

Now what some people do is they have a php script that sits on the webservers, which varnish will run and if everything passes the web server stays in the pool. I didn’t bother to do it this way. I setup a website that connected to a database and had the health probe just look for status 200 code. If the page came up web server stayed in the pool. if it didn’t it will it drop it.

# Define the list of backends (web servers).

# Port 80 Backend Servers

backend web1 { .host = “status.nicktailor.com”; .probe = { .url = “/”; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}

backend web2 { .host = “status.nicktailor.com”; .probe = { .url = “/”; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}

Caching Even if Apache Goes Down

So a few people have written articles about this and say how to do it, but it took me a bit to get this working.

Even in an environment where everything has a redundant backup, it’s possible for the entire site to go “down” due to any number of causes. A programming error, a database connection failure, or just plain excessive amounts of traffic. In such scenarios, the most likely outcome is that Apache will be overloaded and begin rejecting requests. In those situations, Varnish can save your bacon with theGrace period. Apache gives Varnish an expiration date for each piece of content it serves. Varnish automatically discards outdated content and retrieves a fresh copy when it hits the expiration time. However, if the web server is down it’s impossible to retrieve the fresh copy. “Grace” is a setting that allows Varnish to serve up cached copies of the page even after the expiration period if Apache is down. Varnish will continue to serve up the outdated cached copies it has until Apache becomes available again.

To enable Grace, you just need to specify the setting in vcl_recv and invcl_fetch:

 

# Respond to incoming requests.
sub vcl_recv {
# Allow the backend to serve up stale content if it is responding slowly.
set req.grace = 6h;
}

# Code determining what to do when serving items from the Apache servers.
sub vcl_fetch {
# Allow items to be stale if needed.
set beresp.grace = 6h;
}

 

Note- The missing piece to this is the most important piece, without the ttl below if webservers go down, after like 2 mins your backend will show an error page, because by default the ttl for objects to stay in the cache when the webservers aka backend go down is set extremely low. Everyone seems to forget to mention this crucial piece of information.

# Code determining what to do when serving items from the Apache servers.

sub vcl_fetch {

# Allow items to be stale if needed.

set beresp.grace = 24h;
set beresp.ttl = 6h;
}

Just remember: while the powers of grace are awesome, Varnish can only serve up a page that it has already received a request for and cached. This can be a problem when you’re dealing with authenticated users, who are usually served customized versions of pages that are difficult to cache. If you’re serving uncached pages to authenticated users and all of your web servers die, the last thing you want is to present them with error messages. Instead, wouldn’t it be great if Varnish could “fall back” to the anonymous pages that it does have cached until the web servers came back? Fortunately, it can — and doing this is remarkably easy! Just add this extra bit of code into the vcl_recv sub-routine:

 

# Respond to incoming requests.
sub vcl_recv {
# …code from above.

# Use anonymous, cached pages if all backends are down.
if (!req.backend.healthy) {
unset req.http.Cookie;
}
}

Varnish sets a property req.backend.health if any web server is available. If all web servers go down, this flag becomes FALSE. Varnish will strip the cookie that indicates a logged-in user from incoming request, and attempt to retrieve an anonymous version of the page. As soon as one server becomes healthy again, Varnish will quit stripping the cookie from incoming requests and pass them along to Apache as normal.

Making Varnish Pass to Apache for Uncached Content

Often when configuring Varnish to work with an application like Drupal, you’ll have some pages that should absolutely never be cached. In those scenarios, you can easily tell Varnish to not cache those URLs by returning a “pass” statement.

# Do not cache these paths.
if (req.url ~ “^/status\.php$” ||
req.url ~ “^/update\.php$” ||
req.url ~ “^/ooyala/ping$” ||
req.url ~ “^/admin/build/features” ||
req.url ~ “^/info/.*$” ||
req.url ~ “^/flag/.*$” ||
req.url ~ “^.*/ajax/.*$” ||
req.url ~ “^.*/ahah/.*$”) {
return (pass);
}

Varnish will still act as an intermediary between requests from the outside world and your web server, but the “pass” command ensures that it will always retrieve a fresh copy of the page.

In some situations, though, you do need Varnish to give the outside world a direct connection to Apache. Why is it necessary? By default, Varnish will always respond to page requests with an explicitly specified “content-length”. This information allows web browsers to display progress indicators to users, but some types of files don’t have predictable lengths. Streaming audio and video, and any files that are being generated on the server and downloaded in real-time, are of unknown size, and Varnish can’t provide the content-length information. This is often encountered on Drupal sites when using the Backup and Migrate module, which creates a SQL dump of the database and sends it directly to the web browser of the user who requested the backup.

To keep Varnish working in these situations, it must be instructed to “pipe” those special request types directly to Apache.

# Pipe these paths directly to Apache for streaming.
if (req.url ~ “^/admin/content/backup_migrate/export”) {
return (pipe);
}

 

Just remember: while the powers of grace are awesome, Varnish can only serve up a page that it has already received a request for and cached. This can be a problem when you’re dealing with authenticated users, who are usually served customized versions of pages that are difficult to cache. If you’re serving uncached pages to authenticated users and all of your web servers die, the last thing you want is to present them with error messages. Instead, wouldn’t it be great if Varnish could “fall back” to the anonymous pages that it does have cached until the web servers came back? Fortunately, it can — and doing this is remarkably easy! Just add this extra bit of code into the vcl_recv sub-routine:

# Respond to incoming requests.
sub vcl_recv {
# …code from above.

# Use anonymous, cached pages if all backends are down.
if (!req.backend.healthy) {
unset req.http.Cookie;
}
}

Varnish sets a property req.backend.health if any web server is available. If all web servers go down, this flag becomes FALSE. Varnish will strip the cookie that indicates a logged-in user from incoming request, and attempt to retrieve an anonymous version of the page. As soon as one server becomes healthy again, Varnish will quit stripping the cookie from incoming requests and pass them along to Apache as normal.

Making Varnish Pass to Apache for Uncached Content

Often when configuring Varnish to work with an application like Drupal, you’ll have some pages that should absolutely never be cached. In those scenarios, you can easily tell Varnish to not cache those URLs by returning a “pass” statement.

# Do not cache these paths.
if (req.url ~ “^/status\.php$” ||
req.url ~ “^/update\.php$” ||
req.url ~ “^/ooyala/ping$” ||
req.url ~ “^/admin/build/features” ||
req.url ~ “^/info/.*$” ||
req.url ~ “^/flag/.*$” ||
req.url ~ “^.*/ajax/.*$” ||
req.url ~ “^.*/ahah/.*$”) {
return (pass);
}

Varnish will still act as an intermediary between requests from the outside world and your web server, but the “pass” command ensures that it will always retrieve a fresh copy of the page.

In some situations, though, you do need Varnish to give the outside world a direct connection to Apache. Why is it necessary? By default, Varnish will always respond to page requests with an explicitly specified “content-length”. This information allows web browsers to display progress indicators to users, but some types of files don’t have predictable lengths. Streaming audio and video, and any files that are being generated on the server and downloaded in real-time, are of unknown size, and Varnish can’t provide the content-length information. This is often encountered on Drupal sites when using the Backup and Migrate module, which creates a SQL dump of the database and sends it directly to the web browser of the user who requested the backup.

To keep Varnish working in these situations, it must be instructed to “pipe” those special request types directly to Apache.

# Pipe these paths directly to Apache for streaming.
if (req.url ~ “^/admin/content/backup_migrate/export”) {
return (pipe);
}

How to view the log and what to look for

varnishlog |grep -i -v ng (This will output a one page out of the log so you can see it without it going all over the place)

  •       One of the key things to look for is if your back end is healthy, it should show that in this log, if it does not show this, then something is still wrong. I have jotted down what it should look like below.

Every poll is recorded in the shared memory log as follows:

NB: subject to polishing before 2.0 is released!

    0 Backend_health - b0 Still healthy 4--X-S-RH 9 8 10 0.029291 0.030875 HTTP/1.1 200 Ok

The fields are:

  • 0 — Constant
  • Backend_health — Log record tag
  • – — client/backend indication (XXX: wrong! should be ‘b’)
  • b0 — Name of backend (XXX: needs qualifier)
  • two words indicating state:
    • “Still healthy”
    • “Still sick”
    • “Back healthy”
    • “Went sick”

Notice that the second word indicates present state, and the first word == “Still” indicates unchanged state.

  • 4–X-S-RH — Flags indicating how the latest poll went
    • 4 — IPv4 connection established
    • 6 — IPv6 connection established
    • x — Request transmit failed
    • X — Request transmit succeeded
    • s — TCP socket shutdown failed
    • S — TCP socket shutdown succeeded
    • r — Read response failed
    • R — Read response succeeded
    • H — Happy with result
  • 9 — Number of good polls in the last .window polls
  • 8 — .threshold (see above)
  • 10 — .window (see above)
  • 0.029291 — Response time this poll or zero if it failed
  • 0.030875 — Exponential average (r=4) of responsetime for good polls.
  • HTTP/1.1 200 Ok — The HTTP response from the backend.
Some important tools for Varnish
  • Varnishhist – The varnishhist utility reads varnishd(1) shared memory logs and presents a continuously updated histogram show- ing the distribution of the last N requests by their processing. The value of N and the vertical scale are dis- played in the top left corner. The horizontal scale is logarithmic. Hits are marked with a pipe character (“|”), and misses are marked with a hash character (“#”)
  • Varnishtop – The varnishtop utility reads varnishd(1) shared memory logs and presents a continuously updated list of the most commonly occurring log entries. With suitable filtering using the -I-i-X and -x options, it can be used to display a ranking of requested documents, clients, user agents, or any other information which is recorded in the log.

Warming up the Varnish Cache

Note – You will run into this as well. In order for you cache to start working you will need to warm it up. The utility below is what you should use. I went about it a different way while I was testing because I did not know about the tool below. I used a wget script that deleted the pages it downloaded after it was down to warm up my cache when I testing.

Example:
wget –mirror -r -N -D http://www.nicktailor.com – You will need to check the wget flags I did this off memory
  • Varnishreplay -The varnishreplay utility parses varnish logs and attempts to reproduce the traffic. It is typcally used to warm up caches or various forms of testing.The following options are available:
    -abackend Send the traffic over tcp to this server, specified by an address and a port. This option is mandatory. Only IPV4 is supported at this time.
    -D Turn on debugging mode.
    -r file Parse logs from this file. The input file has to be from a varnishlog of the same version as the varnishreplay binary. This option is mandatory.

 

0