Friday, 23 August 2013

EC2 to the rescue

In recent days we had a problem with connectivity between our main data centre in the US and APAC based customers. That's a pretty big problem...

While the connectivity problem was being resolved we stood up a instance in EC2 running the following:
  •     Squid, for proxying requests from customers to our UI platform.
  •     Apache Traffic Server, for reverse proxying requests from customers to our search platform.
  •     vsftpd, for handling some content delivery.
Using existing documentation from previous projects we managed to stand this up in 45 minutes. While not perfect it acted as a bridge between APAC and the US, pleasing our affected customers no end.

Apache Traffic Server was as easy as:

yum install tcl-devel pcre-devel
cd /tmp
wget http://mirror.lividpenguin.com/pub/apache//trafficserver/trafficserver-*.tar.bz2
bunzip2 trafficserver-*.tar.bz2 
tar -xf trafficserver-*.tar
cd trafficserver-*
./configure
make
make install
trafficserver start


Now I added this to the bottom of /usr/local/etc/trafficserver/remap.config:

map http://ec2_external_instances_name / http://orginal_url

and changed this string here /usr/local/etc/trafficserver/records.config:

CONFIG proxy.config.http.server_port INT 80

and then issued a

trafficserver restart

Job done. And don't forget to update the EC2 security group.

Tuesday, 20 August 2013

Using a log file to check application is running on schedule

Sometimes it's overlooked during the writing of a application that sys guys might need to monitor it. A lot of the time this leads to sys guys having to write a few lines of bash to get something that resembles monitoring in place while dev go back and improve the built-in support for monitoring.

I've removed a few senstive bits, but he is a really easy way to check if:

A) A log file has been touched in the last 10 minutes
B) That in the last 30 lines of the log file some expected text has been logged

It's not great, but it's better than nothing:


#!/bin/bash
LOG_FILE=$(find /shared/logs -name "*.log" -mmin -30)
LOG_FILE_CHECK=$(find /shared/logs -name "*.log" -mmin -10 | wc -l)
LOG_FILE_RET=$(tail -30 $LOG_FILE | grep "retrieved" | wc -l)
LOG_FILE_FINISH=$(tail -30 $LOG_FILE | grep "job finished" | wc -l)

if [ $LOG_FILE_CHECK -ne 0 ] && [ $LOG_FILE_RET -ne 0 ] && [ $LOG_FILE_FINISH -ne 0 ] ; then
  echo "OK - running"
  exit 0
else echo "CRIT - not running"
  exit 2
fi

EC2 spot instance prices in August

Not sure what happened with EC2 spot instance prices in the US-East this month but we saw prices rocket, with 2x c1.xlarge spot instances running Linux costing us double what they would have if they'd been 'regular' instances:



Fortunately it was noticed before we racked up too much of a bill (although have already run up double the bill we typically would for August and it's only the 20th!), so we took snapshots of the instances before terminating them - we then launched regular instances using the snapshots.

Made me wonder how many people have been caught out and will get a August bill full of max-priced hatred from Amazon. The problem seems to be peoples approach to spot instances - typically cheap, so lets set the max price to $5.00 (because the systems aren't throw away, aka not suitable for running as a spot instances) and profit. In theory, and practise, that does work (it worked for us for nearly a year) but just a few bad months would undo all that.

I tried to explain this before, that when it comes to using spot instances for platforms that aren't suited to it (and therefore setting the max price high to keep it running) that the house, Amazon, always wins.

Using Glacier for long-term backups

Recently I've been trying to find a new home for 3TB of old EBS data to save on cost. S3 was a consideration, although it's nearly as expensive and my past experiences using FUSE haven't always been great. Another consideration was 'bring the data home', but 3TB of old data on expensive HW didn't seem worth it.

Amazon Glacier is something we'd been meaning to look at for a while, so it seemed the perfect time to give it a go.

I found a super guide to using Glacier on blog.tkassembled.com although for our RHEL based Linux instances it required some tweaking....

To install Glacier on a RHEL based Linux instances:

# yum install python-setuptools git
# pip-2.6 install boto --upgrade
# git clone git://github.com/uskudnik/amazon-glacier-cmd-interface.git
# cd amazon-glacier-cmd-interface
# python setup.py install

I also found that splitting the data up into 200MB chunks, as advised by Amazon, took an age using gzip so instead I ended up using pigz:

# BACKUP_TIME="$(date +%Y%m%d%H%M%S)"
# tar cvf - /mnt/s10_1 | pigz -4 | split -a 4 --bytes=200M - "s10_1.$BACKUP_TIME.tar.gz."

The format of the .glacier-cmd files has now changed too - here is mine:

[aws]
access_key=REMOVED
secret_key=REMOVED

[glacier]
region=us-east-1
logfile=~/.glacier-cmd.log
loglevel=INFO
output=print


Generally I'm been impressed with Glacier, and Glacier-Cmd is a great little tool. Others I've heard about but not used yet are MT-AWS-Glacier and Glaciate Archive.

Should note it took me a good week and a half to split, zip and upload 3TB of data to Glacier using a c1.medium instances in EC2. Glacier certainly isn't 'fast archive', and if you are looking to upload/download content fast then this isn't for you - a simple listing of the information in a Glacier vault can take 4+ hours.