A Vagrant and the Puppet Master: Part 2

Having a development environment setup with a proper provisioning tool is
crucial to improving your workflow. Once you’ve got your virtual machine set
up and ready to
go, you need to have some way of ensuring that it’s set up with the software
you need.

(If you’d like, you can go and clone the
companion repository and
play along as we go.)

For this, my tool of choice is Puppet. Puppet is a bit different from other
provisioning systems in that it’s declarative rather than imperative. What do I
mean by that?

Declarative vs Imperative

Let’s say you’re writing your own provisioning tool from scratch. Most likely,
you’re going to be installing packages such as nginx. With your own provisioning
tool, you might just run apt-get (or your package manager of choice) to
install it:

apt-get install nginx

But wait, you don’t want to run this if you’ve already got it set up, so you’re
going to need to check that it’s not already installed, and upgrade it instead
if so.

if $( which nginx ) then
    apt-get install nginx
else
    apt-get update nginx
end

This is relatively easy for basic things like this, but for more complicated
tools, you may have to work this all out yourself.

This is an example of an imperative tool. You say what you want done, and the
tool goes and does it for you. There is a problem though: to be thorough, you
also need to check that it has actually been done.

However, with a declarative tool like Puppet, you simply say how you want your
system to look, and Puppet will work out what to do, and how to transition
between states. This means that you can avoid a lot of boilerplate and checking,
and instead Puppet can work it all out for you.

For the above example, we’d instead have something like the following:

package {'nginx':
    ensure => latest
}

This says to Puppet: make sure the nginx package is installed and up-to-date. It
knows how to handle any transitions between states rather than requiring you to
work this out. I personally prefer Puppet because it makes sense to me to
describe how your system should look rather than writing separate
installation/upgrading/etc routines.

(To WordPress plugin developers, this is also the same approach that WordPress
takes internally with database schema changes. It specifies what the database
should look like, and dbDelta() takes care of transitions.)

Getting It Working

So, now that we know what Puppet is going to give us, how do we get it set up?
Usually, you’d have to go and ensure that you install Puppet on your machine,
but thankfully, Vagrant makes it easy for us. Simply set your provisioning tool
to Puppet and point it at your main manifest file:

config.provision :puppet => {
    puppet.manifests_path = "manifests"
    puppet.manifest_file  = "site.pp"
    puppet.module_path    = "modules"
    #puppet.options        = '--verbose --debug'
}

What exactly is a manifest? A manifest is a file that tells Puppet what you’d
like your system to look like. Puppet also has a feature called modules that add
functionality for your manifests to use, and I’ll touch on that in a bit, but
just trust this configuration for now.

I’m going to assume you’re using WordPress with nginx and PHP-FPM. These
concepts are applicable to everyone, so if you’re not, just follow along
for now.

First off, we need to install the nginx and php5-fpm packages. The following
should be placed into manifests/site.pp:

package {'nginx':
    ensure => latest
}
package {'php5-fpm':
    ensure => latest
}

Each of these declarations is called a resource. Resources are the basic
building block of everything in Puppet, and they declare the state of a certain
object. In this case, we’ve declared that we want the state of the nginx and
php5-fpm packages to be ‘latest’ (that is, installed and up-to-date).

The part before the braces is called the “type”. There are a huge number of
built-in types in
Puppet and we’ll also add some of our own later. The first part inside the
braces is called the namevar and must be unique with the type; that is, you can
only have one package {'nginx': } in your entire project. The part after the
colon is called the attributes of the resource.

Next up, let’s set up your MySQL database. Setting up MySQL is a slightly more
complicated task, since it involves many steps (installing, setting
configuration, importing schemas, etc), so we’ll want to use a module instead.

Modules are reusable pieces for manifests. They’re more powerful than normal
manifests, as they can include custom Ruby code that interacts with Puppet, as
well as powerful templates. These can be complicated to create, but they’re
super simple to use.

Puppet Labs (the people behind Puppet itself) publish the canonical MySQL
module, which is what we’ll be
working with here. We’ll want to clone this into our modules directory, which we
set previously in our Vagrantfile.

$ mkdir modules
$ cd modules
$ git clone git@github.com:puppetlabs/puppetlabs-mysql.git mysql

Now, to use the module, we can go ahead and use the class. I personally don’t
care about the client, so we’ll just install the server:

class { 'mysql::server':
    config_hash => { 'root_password' => 'password' }
}

(You’ll obviously want to change ‘password’ here to something slightly
more secure.)

MySQL isn’t much use to us without the PHP extensions, so we’ll go ahead and get
those as well.

class { 'mysql::php':
    require => Package['php5-fpm'],
}

Notice there’s a new parameter we’re using here, called require. This tells
Puppet that we’re going to need PHP installed first. Why do we need to do this?

Rearranging Puppets

Puppet is a big fan of being as efficient as possible. For example, while we’re
working on installing MySQL, we can go and start setting up our
nginx configuration.

To solve this, Puppet has the concept of dependencies. If any step depends on a
previous one, you have to specify this dependency explicitly¹. Puppet
splits running into two parts: first, it does compilation of the resources to
work out your dependencies, then it executes the resources in the order
you’ve specified.

There are two ways of doing this in Puppet: you can specify require or
before on individual resources, or you can specify the dependencies all
at once.

# Individual style
class { 'mysql::php':
    require => Package['php5-fpm'],
}

# Waterfall style
Package['php5-fpm'] -> Class['mysql::php']

I personally find that the require style is nicer to maintain, since you can
see at a glance what each resource depends on. I avoid before for the same
reason, but these are stylistic choices and it’s entirely up to you as to which
you use.

You may have noticed a small subtlety here: the dependencies use a different
cased version of the original, with the namevar in square brackets. For example,
if I declare package {'nginx': }, I refer to this later as Package['nginx'].
This is a somewhat strange thing to get used to when starting out, but you’ll
quickly get used to it.

(We’ll get to namespaced resources soon such as mysql::db {'mydb': }, and the
same rule applies here to each part of the name, so this would become
Mysql::Db['mydb'].)

Important note: It’s important not to declare your resources with capitals,
as this actually sets the default attributes. Avoid this unless you’re sure you
know what you’re doing.

Setting Up Our Configuration

We’ve now got nginx, PHP, MySQL and the MySQL extensions installed, so we’re now
ready to start configuring it for our liking. Now would be a great time to try
vagrant up and watch Puppet run for the first time!

Let’s now go and set up both our server directories and the nginx configuration
for them. We’ll use the file type for both of these.

file { '/var/www/vagrant.local':
    ensure => directory
}
file { '/etc/nginx/sites-available/vagrant.local':
    source => "file:///vagrant/vagrant.local.nginx.conf"
}
file { '/etc/nginx/sites-enabled/vagrant.local':
    ensure => link,
    target => '/etc/nginx/sites-available/vagrant.local'
}

And the nginx configuration for reference, which should be saved to
vagrant.local.nginx.conf next to your Vagrantfile:

server {
    listen 80;
    server_name vagrant.local;
    root /var/www/vagrant.local;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ .php {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }
}

(This is not the best way to do this in Puppet, but we’ll come back to that.)

Next up, let’s configure MySQL. There’s a mysql::db type provided by the MySQL
module we’re using, so we’ll use that. This works the same way as the file and
package types that we’ve already used, but obviously takes some different
parameters:

mysql::db {'wordpress':
    user     => 'root',
    password => 'password',
    host     => 'localhost',
    grant    => ['all'],
    require  => Class['mysql::server']
}

Let’s Talk About Types, Baby

You’ll notice that we’ve used two different syntaxes above for the MySQL parts:

class {'mysql::php': }
mysql::db {'wordpress': }

The differences here are in how these are defined in the module: mysql::php is
defined as a class, whereas mysql::db is a type. These reflect fundamental
differences in what you’re dealing with behind the resource. Things that you
have one of, like system-wide packages, are defined as classes. There’s only one
of these per-system; you can only really install MySQL’s PHP bindings once.²

On the other hand, types can be reused for many resources. You can have more
than one database, so this is set up as a reusable type. The same is true for
nginx sites, WordPress installations, and so on.

You’ll use both classes and types all the time, so understanding when each is
used is key.

Moving to Modules

nginx and MySQL are both set up with our settings now, but it’s not really in a
very reusable pattern yet. Our nginx configuration is completely hardcoded for
the site, which means we can’t duplicate this if we want to set up another site
(for example, a staging subdomain).

We’ve used the MySQL module already, but all of our resources are in our
manifests directory at the moment. The manifests directory is more for the
specific machine you’re working on, whereas the modules directory is where our
reusable components should live.

So how do we create a module? First up, we’ll need the right structure. Modules
are essentially self-contained reusable parts, so there’s a certain structure
we use:

modules/<name>/ – The module’s full directory
- modules/<name>/manifests/ – Manifests for the module, basically the same
  as your normal manifests directory
- modules/<name>/templates/ – Templates for the module, written in Erb
- modules/<name>/lib/ – Ruby code to provide functionality for your
  manifests

(I’m going to use ‘myproject’ as the module’s name here, but replace that with
your own!)

First up, we’ll create our first module manifest. For this first one, we’ll use
the special filename init.pp in the manifests directory. Before, we used
the names mysql::php and mysql::db, but the MySQL module also supplies a
mysql type. Puppet maps a::b to modules/a/manifests/b.pp, but a class
called a maps to modules/a/manifests/init.pp.

Here’s what our init.pp should look like:

class myproject {
    if ! defined(Package['nginx']) {
        package {'nginx':
            ensure => latest
        }
    }
    if ! defined(Package['php5-fpm']) {
        package {'php5-fpm':
            ensure => latest
        }
    }
}

(We’ve wrapped these in defined() calls. It’s important to note that while
Puppet is declarative, this is a compile-time check. If you’re making
redistributable modules, you’ll always want to use this, as you can’t declare
types twice, and users should be able to redefine these in their manifests.)

Next, we want to set up a reusable type for our site-specific resources. To do
this in a reusable way, we also need to take in some parameters. There’s one
special variable passed in automatically, the $title variable, which
represents the namevar. Try to avoid using this directly, but you can use this
as a default for your other variables.

Declaring the type looks the same as defining a function in most languages.
We’ll also update some of our definitions from before.³

define myproject::site (
    $name = $title,
    $location,
    $database = 'wordpress',
    $database_user = 'root',
    $database_password = 'password',
    $database_host = 'localhost'
) {
    file { $location:
        ensure => directory
    }
    file { "/etc/nginx/sites-available/$name":
        source => "file:///vagrant/vagrant.local.nginx.conf"
    }
    file { "/etc/nginx/sites-enabled/$name":
        ensure => link,
        target => "/etc/nginx/sites-available/$name"
    }

    mysql::db {$database:
        user     => $database_user,
        password => $database_password,
        host     => $database_host,
        grant    => ['all'],
    }
}

(This should live in modules/myproject/manifests/site.pp)

Now that we have the module set up, let’s go back to our manifest for Vagrant
(manifests/site.pp). We’re going to completely replace this now with
the following:

# Although this is declared in myproject, we can declare it here as well for
# clarity with dependencies
package {'php5-fpm':
    ensure => latest
}
class { 'mysql::php':
    require => [ Class['mysql::server'], Package['php5-fpm'] ],
}
class { 'mysql::server':
    config_hash => { 'root_password' => 'password' }
}

class {'myproject': }
myproject::site {'vagrant.local':
    location => '/var/www/vagrant.local',
    require  => [ Class['mysql::server'], Package['php5-fpm'], Class['mysql::php'] ]
}

Note that we still have the MySQL server setup in the Vagrant manifest, as we
might want to split the database off onto a separate server. It’s up to you to
decide how modular you want to be about this.

There’s one problem still in our site definition: we still have a hardcoded
source for our nginx configuration. Puppet offers a great solution to this in
the form of templates. Instead of pointing the file to a source, we can bring
in a template and substitute variables.

Puppet gives us the template() function to do just that, and automatically
supplies all the variables in scope to be replaced. There’s a great
guide and
tutorial that explain this
further, but most of it is self-evident. The main thing to note is that
template() function’s template location is in the form <module>/<filename>,
which maps to modules/<module>/templates/<filename>.

Our file resource should now look like this instead:

file { "/etc/nginx/sites-available/$name":
    content => template('myproject/site.nginx.conf.erb')
}

Now, we’ll create our template. Note the lack of hardcoded pieces.

server {
    listen 80;
    server_name <%= name %>;
    root <%= location %>;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ .php {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }
}

(This should be saved to modules/myproject/templates/site.nginx.conf.erb)

Our configuration will now be automatically generated, and the name and location
will be imported from the parameters to the typedef.

If you’d really like to go crazy with this, you can basically parameterise
everything you want to change. Here’s an example from one of mine:

server {
    listen <%= listen %>;
    server_name <% real_server_name.each do |s_n| -%><%= s_n %> <% end -%>;
    access_log <%= real_access_log %>;
    root <%= root %>;

<% if listen == '443' %>
    ssl on;
    ssl_certificate <%= real_ssl_certificate %>;
    ssl_certificate_key <%= real_ssl_certificate_key %>;

    ssl_session_timeout <%= ssl_session_timeout %>;

    ssl_protocols SSLv2 SSLv3 TLSv1;
    ssl_ciphers ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
    ssl_prefer_server_ciphers on;
<% end -%>

<% if $front_controller %>
    location / {
        fastcgi_param SCRIPT_FILENAME $document_root/<%= front_controller %>;
<% else %>
    location / {
        try_files $uri $uri/ /index.php?$args;
        index <%= index %>;
    }

    location ~ .php$ {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
<% end -%>
        fastcgi_pass <%= fastcgi_pass %>;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }

    location ~ /.ht {
        deny all;
    }

<% if $builds %>
    location /static/builds/ {
        internal;
        alias <%= root %>/data/builds/;
    }
<% end -%>

<% if include  != '' %>
    <%include.each do |inc| %>
        include <%= inc %>;
    <% end -%>
<% end -%>
}

Notifications!

There’s one small problem with our nginx setup. At the moment, our sites won’t
be loaded in by nginx until the next manual restart/reload. Instead, what we
need is a way to tell nginx that we need to reload when the files are updated.

To do this, we’ll first define the nginx service in our init.pp manifest.

service { 'nginx':
    ensure     => running,
    enable     => true,
    hasrestart => true,
    restart    => '/etc/init.d/nginx reload',
    require    => Package['nginx']
}

Now, we’ll tell our site type to send a notification to the service when we
should reload. We use the notify metaparameter here, and we’ve already set the
service up above to recognise that as a “reload” command.

file { "/etc/nginx/sites-available/$name":
    content => template('myproject/site.nginx.conf.erb'),
    notify => Service['nginx']
}
file { "/etc/nginx/sites-enabled/$name":
    ensure => link,
    target => "/etc/nginx/sites-available/$name",
    notify => Service['nginx']
}

nginx will now be notified that it needs to reload when we both create/update
the config, as well as when we actually enable it.

(We need it on the config proper in case we update the configuration in the
future, since the symlink won’t change in that case. The notification relates
specifically to the resource, even if said resource is the link itself.)

We should now have a full installation set up and ready to serve from your
Vagrant install. If you haven’t already, boot up your virtual machine:

$ vagrant up

If you change your Puppet manifests, you should reprovision:

$ vagrant provision

Machine vs Application Deployment

There can be a bit of a misunderstanding as to what should be in your Puppet
manifests. This is something that can be a bit confusing, and I must admit that
I was originally confused as well.

Puppet’s main job is to control machine deployment. This includes things like
installing software, setting up configuration, etc. There’s also the separate
issue of application deployment. Application deployment is all about deploying
new versions of your code.

The part where these two can get conflated is installing your application and
configuring it. For WordPress, you usually want to ensure that WordPress itself
is installed. This is something that is probably outside of your application,
since it’s fairly standard, and it only happens once. You should use Puppet here
for the database configuration, since it knows about the system-wide
configuration which is specific to the machine, not the application.

You probably also want to ensure that certain plugins and themes are enabled.
This is something that should not be handled in Puppet, since it’s part of
your application’s configuration. Instead, you should create a must-use plugin
that ensures these are set up correctly. This ensures that if your app is
updated and rolled out, you don’t have to use Puppet to reprovision your server.

(If you do push this into your Puppet configuration, bear in mind that updating
your application will now involve both deploying the code and reprovisioning the
server.)

Wrapping Up

If you’d like, you can now go and clone the
companion repository and
try running it to test it out.

Hopefully by now you should have a good understanding both of Vagrant and
Puppet. It’s time to start applying these tools to your workflow and adjusting
them to how you want to use them. Keep in mind that rules are made to be broken,
so you don’t have to follow my advice to the letter. Experiment, and have fun!

There are a few
cases where this doesn’t apply, but you should be explicit anyway. For example,
files will autodepend on their directory’s resource if it exists. [↩]
Yes,
I realise you can do per-user installation, but a) that’s an insane setup; and
b) you’ll need to handle package management yourself this way. [↩]
This previously used
hardcoded database credentials. Thanks to James Collins for catching this! [↩]