Dropsource is a polyglot microservices based platform, and as such, we have spent a lot of time testing, tweaking, evaluating, and just thinking about versioning. Versioning forms the basis for managing our deployment and integration strategies and so we try to fundamentally understand how it applies to best deliver our environments, our geographically distributed developers integration / workflows and deployments. (No surprise right?)
Most of our projects consist of the typical <pick the best software language for the project> stack (polyglot) that is compiled / assembled by Jenkins into an asset package, backed up to S3 by a build ID. These assets are then deployed per environment by Ansible based on a Git environment package manifest which triggers Jenkins. (Obviously there’s more to it, but that’s the gist). Simple. Well… very, very functional.
Recently, I ran into the unique challenge of trying to provide a versioning strategy for our WordPress software implementation used for our main website and documentation portal.
WordPress is meant to be a production workflow for its content. So asking our WordPress users / admins to build in one place and migrate the content to production when they are ready (like our normal software deployment workflow) is an unnatural path for them.
A dev-> staging-> production workflow really doesn’t fit their needs.
WordPress is more a production:dev:content –> production:production:content workflow, with the “draft” being the dev copy.
Regardless of the workflow, as an overly paranoid Infrastructure Architect, I need to be able to:
- ensure we can easily roll back the code and content (create restorable backups)
- keep all our server nodes in sync
- provide the ability to test plugins and custom code changes safely in a place that does not impact production users
- remain able to save the content users are actively creating in the production environment
This list of challenges essentially expresses its requirements as:
- backup production content changes by date/version
- backup production core updates / plugin updates by date/version
- enable a deploy to a “point in time” for rollbacks of these stored changes
- be able to test from a clone of production elsewhere (staging)
Normally, this would simply mean performing a database backup, and putting the core/plugin files into Git.
Unfortunately, the design of WordPress creates a large and ever growing wp-content directory which can very easily exceed the 1 gig mark with just normal theme files, plugins, and content uploads (not counting cached items which would be gitignored). This is a problem as creating a backup in Git with every build would put a lot of stress on Git, which is not usually terrific for very large sized repos. It would also create stress on users (developers for themes, plugins, automated internal staging deployments) to download. Last, distributing potential user content could be an issue eventually.
My first attempt at solving these challenges involved leveraging Amazon’s versioned S3 bucket service, the AWS Elastic File System, a couple of Python scripts and the AWS Command line sync utility in lieu of the normal Git workflow. I reversed the normal workflow so that instead of pushing to Production, we pull from it (except for POT restores).
I developed a mechanism for managing the S3 versioning which I will explain and share. These components combined will give us a backup of the content folder in S3, configs and core in git, all marked to a specific point in time managed by Jenkins with Deployment tasks handled by Ansible. This will allow us to stand up a test server for evaluating changes before applying them to production as well based on current production code and content.
The ideal backup system for me (part of what make Git so wonderful) is that you don’t need to backup every file every time you run the script. You really just want to get the deltas.
Managing a distributed WordPress cluster meant any changes on one node should be reflected across the cluster. As such I enabled the Elastic File System for our wp-content folder. This will allow us to mount our shared content across a multitude of servers handling it. Unlike S3 the files on an EFS share cannot be versioned though, so to fix this we are going to set up a job in Jenkins which will run at regular intervals to sync the file differences to our S3 versioned bucket.
So the first thing we need is a S3 bucket with versioning enabled. You can enable versioning on a bucket very easily by going into the console for s3, selecting your bucket, selecting the “properties” tab, and choosing “Versioning.” Once versioning is enabled your files will automatically be stored with a version number that you can access. This is how we will keep track of our “point in time” file assets.
Being able to restore the whole system to a “point in time” will require us to backup the wp-content folder, the core system files, the configs, and the database. Therefore, I setup a Jenkins job with 3 sections:
The Content section of our Jenkins job backs up the wp-content folder with the AWS S3 sync tools. To do this I have created a temporary mount script which mounts the EFS production share to the staging machine, and creates a backup to the S3 versioned bucket before unmounting again. This is where EFS comes in handy to enable us to run our backup without impacting the production systems.
*Note* This script actually came to be from testing the AWS Data Pipeline temporary job execution system for this that other people had recommended as a solution but the overhead of spinning up temporary resources really just took too long when we already had to have a staging machine running. EFS would allow you to do these from a non-AWS location with a local mount as well (not officially supported, but apparently works) if you were so inclined (and had the bandwidth to spare).
The next part of the job is to run the AWS cli with the S3 sync arguments to send just the changed files in the EFS production wp-content share to the Backup Versioned S3 bucket. These files will automatically be versioned when added to the s3 bucket.
Unfortunately, today there is no correlation available as a “point in time” to download everything from a specific date from our versioned S3 bucket. To get this functionality I created a python script which generates a manifest of the files in our S3 versioned bucket at this point along with the file version ID.
The scripts, and documentation, are available here: https://github.com/vile8/S3-Version-Utilities
Once downloaded you can run with:
python s3_make_manifest.py -m <manifest csv file name> -b <s3 bucket name>
This will generate a manifest in csv format from your AWS versioned bucket (assuming permissions are correct and the bucket is set to be versioned.)
The created manifest file should be named based on the Jenkins Build ID and gives us the first part of our required components, the ability to identify all the files for a given point of time of the deployed WordPress solution for the wp-content section.
The Web section of the Jenkins job checks out the Git repository for the core files to the Jenkins local working directory (without the wp-content subdirectories). Then it executes the backup of the configs on the web servers using the Rsync utility from the server to the Jenkins Git repo we checked out in the local working directory. This gives just the changes from the web servers files against the current Git repository without having git credentials stored remotely. This info is added, committed, pushed, and tagged with the Jenkins Build ID.
The Jenkins Build ID tag then can provide us the Git repo for core and config files at a point in time (the tag), and the wp-content folder with my Python utility. Almost there.
The Database section of the Jenkins job creates 2 assets. First, it creates a snapshot of the database in production (since that is the last part of the “point in time” we need to recreate the system entirely with that moments data). The snapshot is named with the Jenkins Build ID so that we have a correlated database dump for our “point in time.” RDS makes it easy to manage with the AWS command line tools for both creation and restoration of the snapshots as necessary.
With just this first part we now have the ability to restore the production server to a “point of time” using the python restore utility for wp-content, git for core and configs, and the AWS cli for restoring the database snapshot all by the Jenkins Build ID.
But this doesn’t fill all our goals. We still want a staging server. So, the second thing we create is a sql dump which makes mass regex against domain names very simple. This part is to make it easier to create a real staging site to test potentially harmful or dangerous packages or operations before performing on the production site. (For this step I created a specific SQL backup user with read only privileges to the production environment. )
Any user can easily backup the site by simply logging in to Jenkins and click the “Build Now” task link on the OPS Production Web Backup job. Of course, I would recommend adding an auto-run on the Jenkins backup job for once per day or so at least. This way you can at least backup to the last point the system automatically stored everything.
So what happens when you want to restore everything?
I mentioned before that the current AWS S3 sync tool doesn’t let you specify either a date or a manifest of files. Which is why I built the second Python tool that consumes the file version manifest and downloads the files from the versioned s3 bucket to the versions contained in the manifest.
The tool runs multiple threads with a customizable chunk of files per thread and puts the files in a configurable download root. (This also gives us the ability to quickly checkout a set of versions for testing / staging). So if you wanted to see what was in wp-content based on Jenkins Build ID 7 (which was the build from say 2 months ago) you could just run the tool as:
python s3_get_versioned_files_by_manifest.py -b <aws_bucket_name> -r /opt/myfiles/ -m manifest.JenkinsBID7.csv
This will open the manifest, read in a 100 lines and parse them into <filepath><file>:<version> entries. Once the 100 entries are parsed it will spin up the first thread. Each thread will open a connection to S3 and GET files 1 at a time by their <version> to:
*Remember* in the example destination above “/opt/myfiles” was specified as the file root you wish to download the tree to by the command line option “-r”
That’s it. Its really that simple. Also… if you wanted to just download a file, create a simple csv with the AWS path to the version you want and it will get it. Useful for chunking all kinds of subsets.
Again, the tools are available here: https://github.com/vile8/S3-Version-Utilities
With this you can imagine it’s pretty simple to setup restore jobs using this Bucket version download script, the assets for the Database dump or snapshot (depending if you want to replicate or replace the database), and the core files checked into Git by tag.
Well, there you have it. When Versioning is fundamental to your sanity (because you are a hardcore Microservices shop, or because you love to be able to see what changed and identify how it correlates to other things changing) check out my S3 versioning scripts when you get a chance. With these simple tools you can make otherwise hard to manage large changes relatively simple to keep track of at least in a rudimentary way.
Of course don’t forget you can also setup rules for backups to Glacier, so super long term cheap backups are even possible. Here is a nice link to the AWS Documentation on setting up lifecycle rules for just that purpose.
One last note. This system could easily be adapted to Elastic Beanstalk to manage scalable deployments based on versioned subsets.This would be pretty cool for doing A/B environments (providing you can channel your traffic appropriately), and in theory branching should be possible, but it would require a special tag structure when you move the files plus some logic to merge them. Might be worth a look!
Thanks for reading, I hope it helps!