A Vagrant and the Puppet Master: Part 2

Having a development environment setup with a proper provisioning tool is
crucial to improving your workflow. Once you’ve got your virtual machine set
up
and ready to
go, you need to have some way of ensuring that it’s set up with the software
you need.

(If you’d like, you can go and clone the
companion repository and
play along as we go.)

For this, my tool of choice is Puppet. Puppet is a bit different from other
provisioning systems in that it’s declarative rather than imperative. What do I
mean by that?

Declarative vs Imperative

Let’s say you’re writing your own provisioning tool from scratch. Most likely,
you’re going to be installing packages such as nginx. With your own provisioning
tool, you might just run apt-get (or your package manager of choice) to
install it:

apt-get install nginx

But wait, you don’t want to run this if you’ve already got it set up, so you’re
going to need to check that it’s not already installed, and upgrade it instead
if so.

if $( which nginx ) then
    apt-get install nginx
else
    apt-get update nginx
end

This is relatively easy for basic things like this, but for more complicated
tools, you may have to work this all out yourself.

This is an example of an imperative tool. You say what you want done, and the
tool goes and does it for you. There is a problem though: to be thorough, you
also need to check that it has actually been done.

However, with a declarative tool like Puppet, you simply say how you want your
system to look, and Puppet will work out what to do, and how to transition
between states. This means that you can avoid a lot of boilerplate and checking,
and instead Puppet can work it all out for you.

For the above example, we’d instead have something like the following:

package {'nginx':
    ensure => latest
}

This says to Puppet: make sure the nginx package is installed and up-to-date. It
knows how to handle any transitions between states rather than requiring you to
work this out. I personally prefer Puppet because it makes sense to me to
describe how your system should look rather than writing separate
installation/upgrading/etc routines.

(To WordPress plugin developers, this is also the same approach that WordPress
takes internally with database schema changes. It specifies what the database
should look like, and dbDelta() takes care of transitions.)

Getting It Working

So, now that we know what Puppet is going to give us, how do we get it set up?
Usually, you’d have to go and ensure that you install Puppet on your machine,
but thankfully, Vagrant makes it easy for us. Simply set your provisioning tool
to Puppet and point it at your main manifest file:

config.provision :puppet => {
    puppet.manifests_path = "manifests"
    puppet.manifest_file  = "site.pp"
    puppet.module_path    = "modules"
    #puppet.options        = '--verbose --debug'
}

What exactly is a manifest? A manifest is a file that tells Puppet what you’d
like your system to look like. Puppet also has a feature called modules that add
functionality for your manifests to use, and I’ll touch on that in a bit, but
just trust this configuration for now.

I’m going to assume you’re using WordPress with nginx and PHP-FPM. These
concepts are applicable to everyone, so if you’re not, just follow along
for now.

First off, we need to install the nginx and php5-fpm packages. The following
should be placed into manifests/site.pp:

package {'nginx':
    ensure => latest
}
package {'php5-fpm':
    ensure => latest
}

Each of these declarations is called a resource. Resources are the basic
building block of everything in Puppet, and they declare the state of a certain
object. In this case, we’ve declared that we want the state of the nginx and
php5-fpm packages to be ‘latest’ (that is, installed and up-to-date).

The part before the braces is called the “type”. There are a huge number of
built-in types
in
Puppet and we’ll also add some of our own later. The first part inside the
braces is called the namevar and must be unique with the type; that is, you can
only have one package {'nginx': } in your entire project. The part after the
colon is called the attributes of the resource.

Next up, let’s set up your MySQL database. Setting up MySQL is a slightly more
complicated task, since it involves many steps (installing, setting
configuration, importing schemas, etc), so we’ll want to use a module instead.

Modules are reusable pieces for manifests. They’re more powerful than normal
manifests, as they can include custom Ruby code that interacts with Puppet, as
well as powerful templates. These can be complicated to create, but they’re
super simple to use.

Puppet Labs (the people behind Puppet itself) publish the canonical MySQL
module
, which is what we’ll be
working with here. We’ll want to clone this into our modules directory, which we
set previously in our Vagrantfile.

$ mkdir modules
$ cd modules
$ git clone git@github.com:puppetlabs/puppetlabs-mysql.git mysql

Now, to use the module, we can go ahead and use the class. I personally don’t
care about the client, so we’ll just install the server:

class { 'mysql::server':
    config_hash => { 'root_password' => 'password' }
}

(You’ll obviously want to change ‘password’ here to something slightly
more secure.)

MySQL isn’t much use to us without the PHP extensions, so we’ll go ahead and get
those as well.

class { 'mysql::php':
    require => Package['php5-fpm'],
}

Notice there’s a new parameter we’re using here, called require. This tells
Puppet that we’re going to need PHP installed first. Why do we need to do this?

Rearranging Puppets

Puppet is a big fan of being as efficient as possible. For example, while we’re
working on installing MySQL, we can go and start setting up our
nginx configuration.

To solve this, Puppet has the concept of dependencies. If any step depends on a
previous one, you have to specify this dependency explicitly1. Puppet
splits running into two parts: first, it does compilation of the resources to
work out your dependencies, then it executes the resources in the order
you’ve specified.

There are two ways of doing this in Puppet: you can specify require or
before on individual resources, or you can specify the dependencies all
at once.

# Individual style
class { 'mysql::php':
    require => Package['php5-fpm'],
}

# Waterfall style
Package['php5-fpm'] -> Class['mysql::php']

I personally find that the require style is nicer to maintain, since you can
see at a glance what each resource depends on. I avoid before for the same
reason, but these are stylistic choices and it’s entirely up to you as to which
you use.

You may have noticed a small subtlety here: the dependencies use a different
cased version of the original, with the namevar in square brackets. For example,
if I declare package {'nginx': }, I refer to this later as Package['nginx'].
This is a somewhat strange thing to get used to when starting out, but you’ll
quickly get used to it.

(We’ll get to namespaced resources soon such as mysql::db {'mydb': }, and the
same rule applies here to each part of the name, so this would become
Mysql::Db['mydb'].)

Important note: It’s important not to declare your resources with capitals,
as this actually sets the default attributes. Avoid this unless you’re sure you
know what you’re doing.

Setting Up Our Configuration

We’ve now got nginx, PHP, MySQL and the MySQL extensions installed, so we’re now
ready to start configuring it for our liking. Now would be a great time to try
vagrant up and watch Puppet run for the first time!

Let’s now go and set up both our server directories and the nginx configuration
for them. We’ll use the file type for both of these.

file { '/var/www/vagrant.local':
    ensure => directory
}
file { '/etc/nginx/sites-available/vagrant.local':
    source => "file:///vagrant/vagrant.local.nginx.conf"
}
file { '/etc/nginx/sites-enabled/vagrant.local':
    ensure => link,
    target => '/etc/nginx/sites-available/vagrant.local'
}

And the nginx configuration for reference, which should be saved to
vagrant.local.nginx.conf next to your Vagrantfile:

server {
    listen 80;
    server_name vagrant.local;
    root /var/www/vagrant.local;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ .php {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }
}

(This is not the best way to do this in Puppet, but we’ll come back to that.)

Next up, let’s configure MySQL. There’s a mysql::db type provided by the MySQL
module we’re using, so we’ll use that. This works the same way as the file and
package types that we’ve already used, but obviously takes some different
parameters:

mysql::db {'wordpress':
    user     => 'root',
    password => 'password',
    host     => 'localhost',
    grant    => ['all'],
    require  => Class['mysql::server']
}

Let’s Talk About Types, Baby

You’ll notice that we’ve used two different syntaxes above for the MySQL parts:

class {'mysql::php': }
mysql::db {'wordpress': }

The differences here are in how these are defined in the module: mysql::php is
defined as a class, whereas mysql::db is a type. These reflect fundamental
differences in what you’re dealing with behind the resource. Things that you
have one of, like system-wide packages, are defined as classes. There’s only one
of these per-system; you can only really install MySQL’s PHP bindings once.2

On the other hand, types can be reused for many resources. You can have more
than one database, so this is set up as a reusable type. The same is true for
nginx sites, WordPress installations, and so on.

You’ll use both classes and types all the time, so understanding when each is
used is key.

Moving to Modules

nginx and MySQL are both set up with our settings now, but it’s not really in a
very reusable pattern yet. Our nginx configuration is completely hardcoded for
the site, which means we can’t duplicate this if we want to set up another site
(for example, a staging subdomain).

We’ve used the MySQL module already, but all of our resources are in our
manifests directory at the moment. The manifests directory is more for the
specific machine you’re working on, whereas the modules directory is where our
reusable components should live.

So how do we create a module? First up, we’ll need the right structure. Modules
are essentially self-contained reusable parts, so there’s a certain structure
we use:

  • modules/<name>/ – The module’s full directory
    • modules/<name>/manifests/ – Manifests for the module, basically the same
      as your normal manifests directory
    • modules/<name>/templates/ – Templates for the module, written in Erb
    • modules/<name>/lib/ – Ruby code to provide functionality for your
      manifests

(I’m going to use ‘myproject’ as the module’s name here, but replace that with
your own!)

First up, we’ll create our first module manifest. For this first one, we’ll use
the special filename init.pp in the manifests directory. Before, we used
the names mysql::php and mysql::db, but the MySQL module also supplies a
mysql type. Puppet maps a::b to modules/a/manifests/b.pp, but a class
called a maps to modules/a/manifests/init.pp.

Here’s what our init.pp should look like:

class myproject {
    if ! defined(Package['nginx']) {
        package {'nginx':
            ensure => latest
        }
    }
    if ! defined(Package['php5-fpm']) {
        package {'php5-fpm':
            ensure => latest
        }
    }
}

(We’ve wrapped these in defined() calls. It’s important to note that while
Puppet is declarative, this is a compile-time check. If you’re making
redistributable modules, you’ll always want to use this, as you can’t declare
types twice, and users should be able to redefine these in their manifests.)

Next, we want to set up a reusable type for our site-specific resources. To do
this in a reusable way, we also need to take in some parameters. There’s one
special variable passed in automatically, the $title variable, which
represents the namevar. Try to avoid using this directly, but you can use this
as a default for your other variables.

Declaring the type looks the same as defining a function in most languages.
We’ll also update some of our definitions from before.3

define myproject::site (
    $name = $title,
    $location,
    $database = 'wordpress',
    $database_user = 'root',
    $database_password = 'password',
    $database_host = 'localhost'
) {
    file { $location:
        ensure => directory
    }
    file { "/etc/nginx/sites-available/$name":
        source => "file:///vagrant/vagrant.local.nginx.conf"
    }
    file { "/etc/nginx/sites-enabled/$name":
        ensure => link,
        target => "/etc/nginx/sites-available/$name"
    }

    mysql::db {$database:
        user     => $database_user,
        password => $database_password,
        host     => $database_host,
        grant    => ['all'],
    }
}

(This should live in modules/myproject/manifests/site.pp)

Now that we have the module set up, let’s go back to our manifest for Vagrant
(manifests/site.pp). We’re going to completely replace this now with
the following:

# Although this is declared in myproject, we can declare it here as well for
# clarity with dependencies
package {'php5-fpm':
    ensure => latest
}
class { 'mysql::php':
    require => [ Class['mysql::server'], Package['php5-fpm'] ],
}
class { 'mysql::server':
    config_hash => { 'root_password' => 'password' }
}

class {'myproject': }
myproject::site {'vagrant.local':
    location => '/var/www/vagrant.local',
    require  => [ Class['mysql::server'], Package['php5-fpm'], Class['mysql::php'] ]
}

Note that we still have the MySQL server setup in the Vagrant manifest, as we
might want to split the database off onto a separate server. It’s up to you to
decide how modular you want to be about this.

There’s one problem still in our site definition: we still have a hardcoded
source for our nginx configuration. Puppet offers a great solution to this in
the form of templates. Instead of pointing the file to a source, we can bring
in a template and substitute variables.

Puppet gives us the template() function to do just that, and automatically
supplies all the variables in scope to be replaced. There’s a great
guide
and
tutorial that explain this
further, but most of it is self-evident. The main thing to note is that
template() function’s template location is in the form <module>/<filename>,
which maps to modules/<module>/templates/<filename>.

Our file resource should now look like this instead:

file { "/etc/nginx/sites-available/$name":
    content => template('myproject/site.nginx.conf.erb')
}

Now, we’ll create our template. Note the lack of hardcoded pieces.

server {
    listen 80;
    server_name <%= name %>;
    root <%= location %>;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ .php {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }
}

(This should be saved to modules/myproject/templates/site.nginx.conf.erb)

Our configuration will now be automatically generated, and the name and location
will be imported from the parameters to the typedef.

If you’d really like to go crazy with this, you can basically parameterise
everything you want to change. Here’s an example from one of mine:

server {
    listen <%= listen %>;
    server_name <% real_server_name.each do |s_n| -%><%= s_n %> <% end -%>;
    access_log <%= real_access_log %>;
    root <%= root %>;

<% if listen == '443' %>
    ssl on;
    ssl_certificate <%= real_ssl_certificate %>;
    ssl_certificate_key <%= real_ssl_certificate_key %>;

    ssl_session_timeout <%= ssl_session_timeout %>;

    ssl_protocols SSLv2 SSLv3 TLSv1;
    ssl_ciphers ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
    ssl_prefer_server_ciphers on;
<% end -%>

<% if $front_controller %>
    location / {
        fastcgi_param SCRIPT_FILENAME $document_root/<%= front_controller %>;
<% else %>
    location / {
        try_files $uri $uri/ /index.php?$args;
        index <%= index %>;
    }

    location ~ .php$ {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
<% end -%>
        fastcgi_pass <%= fastcgi_pass %>;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }

    location ~ /.ht {
        deny all;
    }

<% if $builds %>
    location /static/builds/ {
        internal;
        alias <%= root %>/data/builds/;
    }
<% end -%>

<% if include  != '' %>
    <%include.each do |inc| %>
        include <%= inc %>;
    <% end -%>
<% end -%>
}

Notifications!

There’s one small problem with our nginx setup. At the moment, our sites won’t
be loaded in by nginx until the next manual restart/reload. Instead, what we
need is a way to tell nginx that we need to reload when the files are updated.

To do this, we’ll first define the nginx service in our init.pp manifest.

service { 'nginx':
    ensure     => running,
    enable     => true,
    hasrestart => true,
    restart    => '/etc/init.d/nginx reload',
    require    => Package['nginx']
}

Now, we’ll tell our site type to send a notification to the service when we
should reload. We use the notify metaparameter here, and we’ve already set the
service up above to recognise that as a “reload” command.

file { "/etc/nginx/sites-available/$name":
    content => template('myproject/site.nginx.conf.erb'),
    notify => Service['nginx']
}
file { "/etc/nginx/sites-enabled/$name":
    ensure => link,
    target => "/etc/nginx/sites-available/$name",
    notify => Service['nginx']
}

nginx will now be notified that it needs to reload when we both create/update
the config, as well as when we actually enable it.

(We need it on the config proper in case we update the configuration in the
future, since the symlink won’t change in that case. The notification relates
specifically to the resource, even if said resource is the link itself.)

We should now have a full installation set up and ready to serve from your
Vagrant install. If you haven’t already, boot up your virtual machine:

$ vagrant up

If you change your Puppet manifests, you should reprovision:

$ vagrant provision

Machine vs Application Deployment

There can be a bit of a misunderstanding as to what should be in your Puppet
manifests. This is something that can be a bit confusing, and I must admit that
I was originally confused as well.

Puppet’s main job is to control machine deployment. This includes things like
installing software, setting up configuration, etc. There’s also the separate
issue of application deployment. Application deployment is all about deploying
new versions of your code.

The part where these two can get conflated is installing your application and
configuring it. For WordPress, you usually want to ensure that WordPress itself
is installed. This is something that is probably outside of your application,
since it’s fairly standard, and it only happens once. You should use Puppet here
for the database configuration, since it knows about the system-wide
configuration which is specific to the machine, not the application.

You probably also want to ensure that certain plugins and themes are enabled.
This is something that should not be handled in Puppet, since it’s part of
your application’s configuration. Instead, you should create a must-use plugin
that ensures these are set up correctly. This ensures that if your app is
updated and rolled out, you don’t have to use Puppet to reprovision your server.

(If you do push this into your Puppet configuration, bear in mind that updating
your application will now involve both deploying the code and reprovisioning the
server.)

Wrapping Up

If you’d like, you can now go and clone the
companion repository and
try running it to test it out.

Hopefully by now you should have a good understanding both of Vagrant and
Puppet. It’s time to start applying these tools to your workflow and adjusting
them to how you want to use them. Keep in mind that rules are made to be broken,
so you don’t have to follow my advice to the letter. Experiment, and have fun!

  1. There are a few
    cases where this doesn’t apply, but you should be explicit anyway. For example,
    files will autodepend on their directory’s resource if it exists. []
  2. Yes,
    I realise you can do per-user installation, but a) that’s an insane setup; and
    b) you’ll need to handle package management yourself this way. []
  3. This previously used
    hardcoded database credentials. Thanks to James Collins for catching this! []

Introducing WP API

As many of you are aware, I was accepted into Google’s Summer of Code program this year to work on a JSON REST API for WordPress. WordPress already has internal APIs for manipulating data via the admin-ajax.php handler in addition to the XML-RPC API. However, XML can be a huge pain to both safely create and parse, and the existing admin API is locked down to authenticated users and is also tailored to the admin interface. The goal of this project is to create a general data API that speaks the common language of the web and uses easily parsable data.

I’d now like to introduce the official repository and issue tracker. There’s also the SVN repository which is kept in sync.

For the next few months, my schedule will be busy implementing the API. Each week from now through the final submission has an individual plan, presented below.

May 27: Acceptance of Project, ensure up-to-speed on existing code
June 3: Work on design documents (response types/collections) and ensure agreement with mentors and interested parties (#264)
June 10: Complete core post type serialisation/deserialisation (basic reading/writing of raw data complete) (#265)
June 17: Work on collection pagination and metadata infrastructure (the full collection of posts can now be accessed and is correctly paginated, allowing for browsing via the API) (#266)
June 24: Creation of main collection views (main post archive, per date, search) (#267)
July 1: Further work on indexes and browsability (#268)
July 8: Create (independent) REST API unit tests for all endpoints covered so far (#269)
July 15: Creation of a Backbone.js example client for testing (#270)
July 22: Spare week to act as a buffer, since some tasks may take longer than expected
July 29: Midterm evaluation!
August 5: Creation/porting of existing generic post type API with page-specific data (#271)
August 12: Creation of attachment-related API (uploading and management) (#272)
August 19: Creation of revision API, and extending the post API to expose revisions (#273, #274)
August 26: Creation of term and taxonomy API (#275)
September 2: Finalisation of term and taxonomy API, and updating of test clients (#276)
September 9: Final testing with example clients (especially with various proxies and in live environments) and security review (#277)
September 15: Spare week for buffer
September 22: Final checking for bugs and preparation for final submission

At the end of each week throughout development, I’ll post a weekly update and tag a new release version, in a manner similar to the release process of MP6. The first release of the API will be posted shortly.

For those looking to keep track of development, I’ll be posting about the API here, which you can follow via the feed. A GSoC P2 is on its way and will be the official place to post comments and feedback (I’ll be crossposting back to here once that’s up). In the mean time, I’ll be posting on this blog and accepting comments here, which is a great way to ask questions and post feedback.

A Vagrant and the Puppet Master: Part 1

In my development workflow, my tools are the thing I deliberate over most. As anyone who follows me on Twitter can attest to, I’m a huge fan of Git and Sublime Text, and conversely I hate Subversion and PhpStorm. I genuinely believe that my tools can make or break how I work and I’m always looking to improve this, constantly searching out for new tools.

By far and away, the tool that has changed how I work the most in the past year is Vagrant with the Puppet configuration tool. For those who don’t know, Vagrant is a tool to create and manage virtual machines, while Puppet is a tool to configure and manage server configuration. The two work extremely well together as a tool for developing in a clean, reproducible environment. Plus, it also provides an easy way to replicate server configuration between your development and production servers.

So, how does it work? Let’s walk through how to set up a development server, plus using that configuration in production!

The first step to getting started is to work out what operating system you want to use. Personally, I’m a fan of Ubuntu Server (12.04 LTS, Precise Pangolin, to be specific), so I’ll be using that in examples, but you can use whatever you like. Some official boxes are available for Ubuntu, while others are available on VagrantBox.es (you can also build your own, but until you’re familiar with Vagrant, I’d recommend using a premade box).

To start off with, you’ll want to download your chosen box to avoid having to redownload it every time you recreate your box.

$ vagrant box precise32 http://files.vagrantup.com/precise32.box

Next up, you want to create your Vagrant configuration and get ready to boot it up. This will create a Vagrantfile in your current directory, so set yourself up a new directory where all your Vagrant-related stuff will live.

$ mkdir example-site/
$ cd example-site/
$ vagrant init precise32

Now, let’s boot up your new virtual machine and make sure it works. The up command will create a virtual machine from your base box if you don’t already have one, boot it, then provision it for usage. We’ll come back to that part in a bit.

$ vagrant up

To get access to your new (running) virtual machine, you’ll want SSH access. If you’re on Linux/Mac, vagrant ssh will work perfectly, but it’s a little harder on Windows. Vagrant tries to detect if your system supports command-line SSH, but doesn’t detect Cygwin environments. For cross-platform parity, I set up an alias called vsh that points to vagrant ssh on my Mac, and the SSH command on Windows, which looks something like:

# Mac/Linux
$ alias vsh='vagrant ssh'

# Windows
$ alias vsh='ssh -p 2222 -i "~/.vagrant.d/insecure_private_key" vagrant@127.0.0.1'

We’re done testing our basic setup, so we can shut our VM down and destroy it, since we’ll want to boot from scratch next time.

$ vagrant destroy

We’ve now verified that the virtual machine works nicely, so let’s bust open your Vagrantfile and get tweaking it. Networking is the first thing we’ll need to get set up, so that we can access our server. Vagrant automatically forwards port 22 from the VM to port 2222 locally so that we can connect, but we also need port 80 for nginx, and we might need more later. We can either set up separate forwarded ports, or enable private networking (my preferred option). Uncomment the private networking line to enable it:

config.vm.network :private_network, ip: "192.168.33.10"

This IP address can be whatever you want, but you need to make sure it’s not covered by your existing network’s subnet. This is usually fine unless you have a custom subnet, in which case 10.x.x.x might be a better choice.

You’ll also want to set up a hostname for this. In your /etc/hosts file (on Windows, C:WindowsSystem32driversetchosts), point vagrant.local to this IP, along with any subdomains you may want. (Note: I’ve seen people use other names here like wp.dev. Keep in mind that these may end up being actual domain names some day with ICANN’s new TLD policy, whereas .local is reserved for exactly this use.)

192.168.33.10 vagrant.local

We’ve now got a working Vagrant setup. In part 2, we’ll take a look at setting up Puppet to get your software working automatically.

Editing Commits With Git

As developers who use GitHub, we use and love pull requests. But what happens when you only want part of a pull request?

Let’s say, for example, that we have a pull request with three commits. Unfortunately, the pull request’s author accidentally used spaces instead of tabs in the second commit, so we want to fix this up before we pull it in.

The main tool we’re going to use here is interactive rebasing. Rebasing is a way to rewrite your git repository’s history, so you should never use it on code that you’ve pushed up. However, since we’re bringing in new code, it’s perfectly fine to do this. The PR’s author may need to reset their repository, but their changes should be on separate branches to avoid this.

So, let’s get started. First, follow GitHub’s first two instructions to merge (click the little i next to the Merge Pull Request button):

git checkout -b otheruser-master master
git pull git@github.com:otheruser/myproject.git master

Now that we have our repository up-to-date with theirs, it’s time to rewrite history. We want to rebase (rewrite) the last three commits, so we specify the range as HEAD~3:

git rebase -i HEAD~3

This puts us into the editor to change the rebase history. It should look like this:

pick c6ffde3 Point out how obvious it is
pick 9686795 We're proud of our little file!
pick a712c2c Add another file.

There are a number of commands we can pick here. For us, we want to leave the first and last unchanged, so we’ll keep those as pick. However, we want to edit the second commit, so we’ll change pick to edit there. Saving the file then spits us back out to the terminal. It’ll also tell us that when we’re done, we should run git commit --amend and git rebase --continue.

Internally, what this does is replay all the commits up until the one with edit. After it replays that commit, it pauses and waits for us to continue it.

Now, we can go and edit the file. Make the changes you need here, then as it told us, it’s time to make amends:

git commit --amend

This merges the changes you just made with the previous commit (the one with the misspelling) into a new commit. Once we’ve done that, we need to continue the rebase, as all the commits after this one have to be rewritten to point to our new one in the history.

git rebase --continue

Our history rewriting is now done! It’s now time to merge it back in to master, push up our changes and close the pull request. First we’ll need to switch back to master, then we can merge and push.

git checkout master
git merge otheruser-master master
git push

Congratulations, you just (re)wrote history!


For those of you who want to try this, I’ve made a test repository for you to work on. Try it out and see if you can fix it yourself!

Thanks to Japh for asking the question that inspired this post!

Drinking From the Firehose

For those who follow me on Twitter, you’d know that I recently attended the WordPress Community Summit. (If you’re expecting a long blog post on the summit, you must be new to my blog. 😉 ) One of the suggestions that came up was to subscribe to the WP-Trac mailing list. This list gets a copy of every comment and change (except attachments) on Trac tickets, and is the a great way to follow activity on WordPress, since you get to see every issue change that happens.

However, following this activity comes with a giant downside: there is a lot of activity on Trac, so it can be hard to keep track of all the things going on; for this reason, it’s commonly called drinking from the firehose1 CC-ing yourself on a ticket (or getting auto-CC’d when you comment) is a great way of getting informed of any changes to it, but if you’re drinking from the firehose, this becomes fairly useless: either you get two copies of each change, or you only get one and CC-ing does nothing.

Thankfully, with a little Thunderbird magic, I’ve come up with an optimal solution: CC’d emails come through to my main email address, while the mailing list is sent through to my firehose email account. I then set up a Thunderbird filter to move the message from my main email to the firehose account, and apply a “CC’d” label to it. Thunderbird merges the messages, since they have the same message ID, then applies the label to the merged message. I can check at a glance and see what’s important to me, while still retaining the ability to follow the project as a whole.

Screenshot showing the styling applied to the emails in my inbox
My inbox. Bold is unread, blue is CC’d. The arrow on the left indicates a thread.

For those of you who want to get involved more in WordPress, I’d definitely recommend this. Already I’ve noticed more stuff that interests me, and it’s fairly minimal effort to go through the messages.2

  1. I’m personally not sure where the phrase comes from. Wiktionary notes the usage of the phrase in 2004 with relation to technology, however I’m fairly certain the phrase itself comes from UHF. []
  2. It might seem like a lot of messages, but most of the messages are triaging, or comments along the same thread. Once you’ve got the gist of the ticket, you can fairly safely delete the entire thread without needing to read the minutia of the implementation. []

Why WP_Error Sucks

Anyone who has seen me talk in the #wordpress-dev IRC room will know that I’m
not a huge fan of WP_Error. However, for some insane reason, some people are.
I figured it’s probably time to explain why WP_Error sucks, and what we can do
about it.

Conception of WP_Error

Back in the days of WordPress 2.0, errors were handled by returning false from
WordPress functions, or occasionally error strings. For 2.1, it was decided
to change this to returning an error object instead. This error object gave the
ability to indicate an error had occurred, but still include information with
the error that could be used programmatically, or as a user-friendly message.

Given the context of its conception, WP_Error was a great idea; it gave the
easy ability to pass data regarding errors around while still noting that it
was an error, rather than actual data.

State of the Error

Currently, most WordPress functions return a WP_Error object if something goes
wrong. Based on what I wrote just before, this might seem like an awesome idea.
However, imagine what happens if I have a helper function:

/**
 * Retrieve and decode JSON data from a URL
 */
function rmccue_my_http_helper($url) {
    $response = wp_remote_get($url);
    if (is_wp_error($response)) {
        return $response;
    }

    return json_decode($response['body']);
}

This might seem fine, but note that we have to handle WP_Error differently here.
Errors give no useful information to this function, so we could just return
false. However, this deprives the caller function of the ability to find out
about the error.

For an example of when this becomes unwieldy, let’s look at what happens when
the above function gets used:

/**
 * Get current message from API
 */
function rmccue_get_api_msg() {
    $data = rmccue_my_http_helper('http://api.example.com/');
    if (is_wp_error($data)) {
        return $data;
    }

    return $data['apidata']['messages']['latest'];
}

/**
 * Output result to header
 */
function rmccue_output_message() {
    $message = rmccue_get_api_msg();
    if (is_wp_error($message)) {
        echo $message->get_error_message();
    }
    else {
        echo $message;
    }
}

Note that we now have three places where we’re checking if we got an error
back, but only one place where that check is actually useful (i.e. when we
output it).

Even worse than this is when developers forget to check for errors (I’ll admit,
I’ve been guilty of this many times). Suddenly, they’re trying to use a
WP_Error object as an array or an integer, and PHP will fail, or worse, give
garbage output.

I Take Exception to That!

As anyone who has worked with WordPress knows, WordPress supported PHP 4 for a
long time, even after many other projects had switched. The advantage of this
was supporting significantly more hosts, with most PHP 5-only features either
not being needed or being easy to reimplement.

One of the new features added to PHP in PHP 5 was exception handling. For those
who aren’t aware of it, exception handling is a way to indicate an error and
have it handled at an appropriate place without having to check values
constantly. This might sound familiar to you: isn’t this what WP_Error was
intended to solve?

The answer is yes, but not quite. WP_Error is essentially the poor man’s
exception. Unlike WP_Error, the basic idea of exceptions is that you only
worry about errors where they actually matter and lower-level functions can
forget about needing them. Exceptions continue up the call stack until they’ve
been caught, when they can then be handled as necessary.

This might seem a bit confusing, so here’s what our previous example would look
like if we used exceptions instead (assuming wp_remote_get() threw a
WP_Exception exception):

/**
 * Retrieve and decode JSON data from a URL
 */
function rmccue_my_http_helper($url) {
    $response = wp_remote_get($url);
    return json_decode($response['body']);
}

/**
 * Get current message from API
 */
function rmccue_get_api_msg() {
    $data = rmccue_my_http_helper('http://api.example.com/');
    return $data['apidata']['messages']['latest'];
}

/**
 * Output result to header
 */
function rmccue_output_message() {
    try {
        $message = rmccue_get_api_msg();
        echo $message;
    }
    catch (WP_Exception $exception) {
        echo $exception->get_error_message();
    }
}

See the difference? Instead of having to check at every level for exceptions, we
can now just let the exception pass up to somewhere that matters.

How does this work? In this case: if wp_remote_get() throws an exception, this
is passed up to rmccue_my_http_helper(). There’s no try ... catch in this
function, so we continue up the callstack, through rmccue_get_api_msg() until
we hit the try ... catch in rmccue_output_message(). Here, we catch the
exception and handle it as appropriate.

Exceptions also provide valuable context for developers. Rather than having to
check all the places where the error could have occurred, every exception
includes a traceback; that is, the entire callstack up until when the exception
was thrown. This gives you an easy way to see where an error occurred and makes
debugging much easier.

How We Can Start Using Exceptions Now

Although WordPress doesn’t use exceptions internally, you can already start
using them. For example, Renku uses them internally to save on a lot of
code.

The basic concept of using exceptions in your code is simple: whenever you get
a WP_Error object, convert it to an exception. In our above example, this
would mean handling it in rmccue_my_http_helper() and
rmccue_output_message(), but we’d no longer have to handle it inbetween.
Here’s what the above would look like:

/**
 * Retrieve and decode JSON data from a URL
 */
function rmccue_my_http_helper($url) {
    $response = wp_remote_get($url);
    if (is_wp_error($response)) {
        throw new Exception($response->get_error_message(), $response->get_error_code())
    }
    return json_decode($response['body']);
}

Here, we convert the WP_Error to an exception as soon as possible, allowing us
to skip most of the extra handling in our code.

What About Core?

Unfortunately, exceptions don’t appear to be getting into core any time soon.
Some of the core developers are very against exceptions (for reasons I can’t
completely comprehend). One of the arguments made against using exceptions in
core was the possibility of confusing theme developers. I’d actually make the
counter-argument that WP_Error is more confusing to theme developers. Having
to check at every possible stage if a result is_wp_error() is much more
confusing and is not something that theme developers are necessarily going to
remember.

Another of the issues I’ve heard raised is that of the fatal nature of
exceptions. Any exceptions that haven’t been caught by the time they get to the
top-level are handled by a default exception handler, or failing that, cause a
fatal error. The solution for WordPress is easy: firstly, add a default
exception handler that uses wp_die(), much like the existing handling for
fatal errors; secondly, add a try ... catch inside
do_action()/apply_filters(). Most plugins run the majority of their code
inside actions/filters, so this would ensure that any exceptions would only
cause that specific callback to fail. This would keep WordPress running with
minimal interruptions to the existing workflow.

The only issue that I can see is one of backwards compatibility. The best way to
deal with that would be to announce that exceptions will be used in two releases
time (for example), and to encourage developers to switch to it. WP_Error
could immediately be changed to extend WP_Exception (which would in turn
extend Exception). This would give the ability for proactive plugin developers
to switch easily. For example, our HTTP helper function:

/**
 * Retrieve and decode JSON data from a URL
 */
function rmccue_my_http_helper($url) {
    $response = wp_remote_get($url);
    if (is_wp_error($response)) {
        if (class_exists('WP_Exception') && $response instanceof WP_Exception) {
            throw $response;
        }
        else {
            throw new Exception($response->get_error_message(), $response->get_error_code())
        }
    }
    return json_decode($response['body']);
}

This would give complete forward and backward compatibility for these plugins
and enable a smooth transition to exceptions.


So, what are you waiting for? Get out there and use exceptions!

Optimising WP E-Commerce’s SQL

As part of my most recent project (which you’ll be hearing more about very soon), I’ve been working with WP e-Commerce and having a tonne of fun dealing with all the bits and pieces. In general, it has been quite handy, since it has meant I don’t have to deal with implementing all the payment handling and such. However, it does have its issues, including a fairly horrible API.

WPEC is also quite a bit inefficient, due in part to its customisability. However, it’s definitely nothing insurmountable with a bit of code and some clever tricks.

Note: I’ll be using code from 4.0-dev in examples, but it should all be the same for the latest stable version as well.

So, with all of that out of the way, let’s get started. First step in optimising anything in WordPress is to turn WP_DEBUG on. We’ll also want to turn SAVEQUERIES on so that we can see what exactly is getting queried. The Debug Bar plugin will also help to view the results of these.

To start off with, here’s the MySQL queries that were generated by WPEC for me on a non-WPEC page:

SELECT option_value FROM wpstore_options WHERE option_name = '_transient_timeout_wpsc_theme_path' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = '_transient_wpsc_theme_path' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = 'wpsc_replace_page_title' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = 'wpsc_hide_featured_products' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = 'base_zipcode' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = 'wpsc_ups_settings' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[productspage]%'  AND `post_type` = 'page' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[shoppingcart]%'  AND `post_type` = 'page' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[transactionresults]%'  AND `post_type` = 'page' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[userlog]%'  AND `post_type` = 'page' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = '_transient_timeout_wpsc_url_wpsc-default.css' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = '_transient_wpsc_url_wpsc-default.css' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = 'google_server_type' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = 'google_cur' LIMIT 1

That’s 14 queries for essentially nothing! Even worse are the four fulltext queries to find those shortcodes. Surely we can do better.

So, let’s start cutting pieces out. The first part that concerned me was the two google_ queries, as I’m not using Checkout. As it turns out, the Google Checkout plugin does all sorts of stuff even if it’s not loaded. This is not something we want. However, this is easy to fix. WPEC loads everything in the wpsc-merchants/ directory, but no other code relies on these merchants, so simply remove the ones you don’t need. We’re using Brent Shepherd’s PayPal Digital Goods payment gateway (which hopefully will make it into WPEC 4.0). This gateway uses the new 4.0 merchant gateway classes, so we don’t actually need anything in wpsc-merchants/. Before you remove all the files though, note that a blank directory will cause errors, so leave testmode.merchant.php to avoid this.

Right, we’re now down to 12 queries. Next job, cutting out the shipping information. Both base_zipcode and wpsc_ups_settings are being loaded, despite no shipping handlers being activated. As our store is purely virtual goods, we don’t need any of the shipping items, so we’ll do as before and remove them all. Be wary of the blank directory issue though, and leave at least one file in there (I chose flatrate.php).

OK, 10 queries! We’re making great progress. Next step is wpsc_replace_page_title and wpsc_hide_featured_products. Go into the presentation tab of your settings and resave, and this should save these to the database and set the autoload property, causing them to be loaded in the initial WordPress settings query. However, I noticed this was not happening on our server (I suspect that if they are set to off, they simply aren’t being saved), so I hardcoded them in the theme:

// pre_option_$x doesn't like false, so return 0 instead
add_filter('pre_option_wpsc_replace_page_title', '__return_zero');
add_filter('pre_option_wpsc_hide_featured_products', '__return_zero');

Of course, if you want to enable them, you should use '__return_true' here instead, however the settings page should work for this.

By now, we should be down to the following 8 queries:

SELECT option_value FROM wpstore_options WHERE option_name = '_transient_timeout_wpsc_theme_path' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = '_transient_wpsc_theme_path' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[productspage]%'  AND `post_type` = 'page' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[shoppingcart]%'  AND `post_type` = 'page' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[transactionresults]%'  AND `post_type` = 'page' LIMIT 1
SELECT post_name FROM `wpstore_posts` WHERE `post_content` LIKE '%[userlog]%'  AND `post_type` = 'page' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = '_transient_timeout_wpsc_url_wpsc-default.css' LIMIT 1
SELECT option_value FROM wpstore_options WHERE option_name = '_transient_wpsc_url_wpsc-default.css' LIMIT 1

So, first, let’s look at those transients. These transients work by caching where the WPEC theme files exist, to avoid having to check the stylesheet directory, then the template directory, then the default WPEC directory. There are two options to changing this: you can either head into your MySQL database and set the autoload value for these options to yes, or simply hardcode it. Personally, I know where these files are always going to live, so I went with hardcoding:

add_filter('pre_transient_wpsc_theme_path', array(__CLASS__, 'hardcode_wpsc_theme_path'));
add_filter('pre_transient_wpsc_url_wpsc-default.css', array(__CLASS__, 'hardcode_wpsc_theme_url'));

public function rm_hardcode_wpsc_theme_path($value) {
	return WPSC_CORE_THEME_PATH;
}

public function rm_hardcode_wpsc_theme_url($value) {
	return get_stylesheet_directory_uri() . '/wpsc-default.css';
}

We’ve now hardcoded most things and we’re down to four queries: the shortcode queries. Why does WPEC even need to look these up? Well, in order to create URLs for products, WPEC needs to know the base URL, which is set to the page where your productspage shortcode is set. There’s no easy way to get these, so it has to do a LIKE query across all of your pages. Doing this on each page load is a huge strain though (there is a bug filed about this though, so the developers are aware), especially given that we’re not going to be changing this often.

My favourite way to do this, as you may have noticed, is to hardcode it. Unfortunately, there are no filters on this, so you’ll need a custom patch to WPEC to add support for this. Essentially what the patch does is allow the page names to be set previously. I personally think that wp-config.php is the best place for these to live, but it’s your choice on where it is. Here’s what your code should look like:

global $wpsc_page_titles;
$wpsc_page_titles = array(
        'products' => 'store',
        'checkout' => 'checkout',
        'transaction_results' => 'transaction-results',
        'userlog' => 'your-account',
);

(The values should be set to the slug for each page respectively.)

Voilà, we’re down to zero queries from WPEC! This should minimise any extra stress on your MySQL server when it’s really not needed.

Sidenote: Some of these inefficiencies can be patched in WPEC, while others can’t be, due to the nature of hardcoding them. For those that can be patched, I’ll be attempting to work with the WPEC team to help them fix it. A quick site benefits everyone. 🙂

Edit: WordPress has __return_zero() built-in, thanks Rarst.