How I Would Solve Plugin Dependencies

One of the longest standing issues with the plugin system in WordPress is how to solve the issue of dependencies. Plugins and themes want to bring in libraries, other plugins, or parent themes, but right now, the solutions are somewhat terrible. I thought it was time to get my thoughts down on (virtual) paper.

What’s the problem?

Software is invariably never built in isolation (“no man is an island”), so they are naturally drawn to using external libraries. Extending an existing system is also extremely useful; we can see that from the plugin ecosystem in WordPress itself.

However, right now, there’s no good way to do these in a way that interoperates with other plugins and sites. There are various third-party solutions, but often these require code duplication or offer a substandard user experience.

The Jetpack Problem

This lack of proper dependencies is one of the key reasons behind the system of ever-growing codebases, and is exactly why Jetpack is a gigantic plugin rather than being split out. In an ecosystem with a proper dependency system, Jetpack would simply be the “core” of other plugins, being depended on for core functionality, and offering UI to tie it all together.

One of my personal key problems with Jetpack is that it duplicates the plugin functionality in WordPress (poorly, at times), and hence doesn’t work with standard tooling. Real dependencies would help to solve this. A future Jetpack with a plugin dependency system shouldn’t look any different to the current UI, but would use real plugins internally. This would ensure that the Jetpack core stays lightweight while still offering all the functionality.

Changing this to use a real dependency system would have benefits both for developers and users. The install process of Jetpack could be improved by allowing the core of the plugin to be downloaded first, letting the user set up and configure Jetpack while the rest of the plugin downloads in the background. Users and developers concerned about the size of the plugin could install only the parts they need, reducing file size and potential attack surface across the plugin.

User Experience

In the wider ecosystem, we can see other plugins running into the same issue. The largest plugins, including WooCommerce, EDD and Yoast SEO, have some form of an extension list to attempt to solve this, but invariably end up offering a poorer user experience, sending users off to other sites.

Without creating a full library to handle this for a plugin, invariably we end up with terrible UX. I’ve seen plugins do everything from pop up a message on install saying “search for X, and install it”, to straight up installing plugins and breaking a site completely. This run-time verification also breaks workflow for version-controlled sites, as plugin installation and upgrading is typically done independently of the site itself.

Products vs Services

On a more selfish note, plugins like the REST API would see increased adoption from plugin and theme developers if they could use a unified, simple system to require it. For developers who actually care about user experience, giving terrible messages to users or including a complex library just for dependencies isn’t something they want to handle.

This has partially stymied adoption of the API, as “product” developers (theme and plugin developers) don’t want to offer a substandard experience, Worse, it has skewed our development pattern towards “service” developers (agencies doing work for clients, and teams running SaaS platforms), who have the ability to run anything they like without running into these issues. This means that very real issues that we need to tackle in order to scale to the long-tail may be deprioritised in favour of those affecting services.

How do we solve it?

This is one of those ideas that I’ve had floating around in my head for a while, basically fully-formed, but with no time to execute. I’m writing this as a guide to how I see the problem being solved, with the hopes that someone has the ability to execute this the way it should be done. Imagine this as a blueprint for a successful project, albeit not the final design.

(Note that whenever I say a plugin, I actually mean plugins or themes, as behaviour should be the same for both.)

Internal Workings, ft. Composer

Any PHP developer who has worked outside of WordPress recently will know Composer. Composer, for those who aren’t aware, is a command-line tool for managing dependencies in PHP. Composer is also not a good solution to the dependency problem for WordPress plugins: it requires CLI access and knowledge, it has a somewhat clunky interface and user experience (edit a JSON file, then generate a lock file and a vendor directory, then maybe commit one or more of those), and it also requires PHP 5.3+ (a non-starter for core integration, currently).

However, one of the key parts of Composer is the dependency solver, which is a port of the libzypp solver. This is a “SAT solver”: it takes note of what’s available and of what something requires, then it works out whether it can install the software (it solves the satisfiability problem). This solver is the key to working out the dependency chain for openSUSE packages (where libzypp is originally from), and the same system is used by Composer. This system would be a fantastic base for a plugin dependency system.

Developer User Experience (DUX)

The experience for developers needs to be a familiar one. Plugin headers are a great place to start, but they quickly become untenable in their current state, as they’re not built for complexity (check any theme with more than a few tags to see what I mean). It’s possibly that with some tweaking they could be used, but this may be hard to achieve.

Ideally, we’d want the dependencies to be declarative, since this would help out a bunch of automated tooling. However, we can’t solve every problem at once. For bootstrapping this project off the ground, procedural code will work just fine.

I have a semi-working proof of concept that looks something like this:

The top three lines of code are all that’s required to check if your dependencies exist. We can automatically detect which plugin called the function, and parsing it out is relatively simple; we just then need to pass it to WP.org to see if we can get it working.

I’ve also written up some more complex usage patterns for the system for developers doing more advanced usage. (Note that the documents linked here relate to an early prototype I was working on, so not everything there matches this document; notably, allowing Composer dependencies isn’t something I’d suggest for right now.)

End-User Experience (EUX)

The end-user experience is key to gaining adoption. You need to offer an experience that users are familiar with, and that doesn’t require a bunch of manual steps. We are working on computers, after all, which are meant to automate the dumb tasks for us.

The EUX starts before the user even installs a plugin or theme. The information screen needs to show them what the plugin needs (the full dependency tree, not just direct ones), as well as any potential conflicts with existing plugins. Installing that plugin should then also ensure that the dependencies are also installed, failing if any of the dependencies fails to install correctly. All of this needs to occur before the plugin is actually run, ensuring that the plugin doesn’t have to worry about double-checking everything before it can actually do any code. (This tends to overcomplicate a codebase with no gain.)

Once a plugin and its dependencies are installed, they then need to be maintained. Plugins should receive regular updates as usual, but the end user needs to at least be warned if an update will break compatibility with another. To accommodate urgent, breaking changes, users must be allowed to update plugins even if it would cause incompatibility, and the dependency system should ensure that the other plugins are disabled as needed. (If autoupdates for plugins are added to core, this would still be a manual process.) Trust the user to do the right thing, but ensure they cannot break their own system.

On the other end, uninstalling a plugin should correspondingly offer to remove anything it depends on if not being used by anything else. This again should always be the user’s choice, as depended-on plugins may have use apart from just being a dependency.

Distribution

Getting these plugin dependencies available is the hardest part of the equation. Developers need to be able to depend on (ha ha) the system being available to them, otherwise it’s not going to get adoption regardless of how great it is. This is true for any core feature (like a REST API), but especially so for something that needs to essentially be hidden from the user.

The end goal here is core integration. If the solution doesn’t end up in core at the end, the project has failed, as it’s not ubiquitous. If this happens, throw out what you need and try again, but it must be in core to be a viable solution for many users.

The best alternative, and best way to bootstrap in the meantime, is to aim for integration into Jetpack. Jetpack is one of the most widely used plugins, giving you a huge userbase straight out of the gate. This solution would also be incredibly valuable to Jetpack in making it more modular, and allowing it to shed some of the weight it currently has. Obviously, no one except the Jetpack team has a say over this, but it’s a good way to get your foot in the door. (Plus, it gets the Jetpack team potential extra lock-in benefits, as everyone would need to require Jetpack, albeit temporary.)

There’s precedent in WordPress’ past for this too. Sidebar widgets were originally developed as a plugin by Automattic, then eventually integrated into WordPress core. Widgets used WordPress.com to bootstrap their development process, and in a modern WordPress, would likely piggy-back on Jetpack as well.

Potential Issues

One key potential issue I see is dependency versions. By allowing plugins to require certain versions, it’s possible to end up in situations where unrelated plugins cannot both be installed due to a mutual incompatibility with a library. This could be caused by a plugin requiring too specific a version (“only version 1.2.5, please!”) or an actual incompatibility between major branches. In order to balance these concerns, it may be wise to only allow requiring major versions, with the responsibility on plugin developers to stick to this system.

We also need to be careful to avoid situations like DLL Hell, where mutual incompatibilities between plugins cause installs and upgrades to be impossible without breaking something else. Encouraging plugins to maintain full compatibility is a top priority, which removing the ability to depend on specific versions may help with.

Distribution will be the biggest issue. It may be tempting to bundle with another large plugin (Yoast SEO, WooCommerce, etc), but you risk fragmentation by allowing bundling with more than one plugin, and no one’s going to want to be left without it if it’s that good. We can already see this problem with some of the libraries out there now, where mutually incompatible versions are used by different plugins.

Finally

I’m desperately hoping this post serves as inspiration for someone to create a proper solution to this. I don’t care if it gets solved the way I’ve thought of, there are plenty of other ways to skin this particular cat, and none of them is the “right” way.

(I started on a solution, but truly don’t have the time to dedicate to this. However, I’m willing to offer every piece of code I wrote for the prototype right now to kickstart this.)

What we need is something better than the current solutions. And not just better, but radically better.

Will you be the one to create it?

Beginnings

I announced a little while ago that I was making a change in my life. Over the past month-and-a-bit, I’ve been talking with many people and deciding where I want to spend the next stage of my career.

I’m delighted to announce that I’ve accepted a position working at Human Made. I’ve been hearing great things about Human Made for a long time, and, after talking to Tom, Joe and Noel, decided they’d be a fantastic fit.

In my day-to-day work at Human Made, I’ll be working on both client work as well as products, such as happytables. In fact, I’ve already begun shipping code, and had my first deploy last week (along with my first broken deploy, and my first scramble-to-fix-the-fatal-errors). I also shipped a cool little timezone widget that shows exactly what time of day it is for the humans that compose Human Made:

Timezone widget screenshot, showing avatars with their associated current time

I’m looking forward to seeing where this change takes me. If the first week is any indication, I definitely made the right choice.

See also: my post on the Human Made blog.


In other news, I can’t resist linking to a great piece of music by a famous French duo, that seems at least somewhat relevant:

Using Custom Authentication Tokens in WordPress

Much has been written about the ability in WordPress to replace the authentication handlers. Essentially, this involves replacing WordPress’ built-in system of username and password combinations with a custom handler, such as Facebook Connect or LDAP.

However, basically nothing appears to have been written on the other side of authentication: replacing WordPress’ cookie-based authentication tokens. The process of authentication in WordPress is simple and looks something like this:

  1. Check the client’s cookies – If we have valid cookies, skip to step 6
  2. Redirect the user to the login page
  3. Show the user a login form
  4. Check the submitted data against the database
  5. Issue cookies for the now-authenticated user
  6. Proceed to the admin panel

The existing authenticate hook allows users to swap out step 4 reasonably easily, and existing hooks allow replacing steps 2 and 3. The problem, however, is swapping out cookies in steps 1 and 5.

There’s a few reasons you might want to swap out the existing cookie handling: you’re passing data over something that’s not HTTP (CLI interface, e.g.); you’re using a custom authentication method (OAuth, e.g.); or, as with anything in WordPress plugins, some far-out idea that I can’t even fathom. Any of these require swapping out cookies for your custom system, however there’s not quite any good way to do so.

The existing solution to this is to hook into something like plugins_loaded and check there, however this will occur on every request, even if you don’t actually need to be authenticated. This makes it hard to issue error responses (such as HTTP 401/403 codes) without also denying access to non-authenticated requests.1

The correct way to do this really would be to use a late-check system the same way WordPress itself does. All WordPress core functions eventually filter down to get_currentuserinfo()2, which in turn calls wp_validate_auth_cookie(). It’s worth mentioning at this point that all of is_user_logged_in(), wp_get_current_user() and get_currentuserinfo() contain a total of zero hooks. We get our first respite in wp_validate_auth_cookie() with the auth_cookie_malformed action, however setting a user here is then overridden straight afterwards by wp_set_current_user( 0 ).

*sigh*

So, here’s the workaround solution. Hopefully this helps someone else out.

(This is also filed as ticket #26706.)

  1. This is less of an issue if you can detect whether a client is passing authentication, such as checking for existence of a header, but some naive clients send authentication headers with every request anyway. This happens to be the scenario I find myself in. []
  2. wp_get_current_user() e.g. calls it, is_user_logged_in() calls wp_get_current_user(), etc []

Change

It’s time for a change.

Fifteen years ago, I started school. Seven years ago, I finished primary school and started high school. Three years ago, I finished school. And two years ago, I started university.

For me, attending university was a natural course of action. The question was always one of what I would study at university, not whether I would. In my last years at school, I stressed over this question, as most graduating high-schoolers do.

My choices became clearer to me as I came closer to the end of my degree. I decided that engineering was where my talents were.

In the meantime, I was approaching having spent nine years of my life writing software. Programming had always been something that I’d enjoyed, and I’d become relatively good at it. It was natural that I could take my talent and apply it to a career.

However, I didn’t.

I was afraid that doing something I loved as a career would cause me to eventually become sick of it, and that wasn’t something I wanted.

So I chose electrical engineering, in the hope that it’d be similar enough to what I’d done and enjoyed previously, but different enough to prove a challenge.


I’ve just completed the second year of my five-year degree. This year, I failed four subjects: three math subjects and an electrical engineering subject.

It’s not that I found the subjects particularly hard. They were challenging, certainly, but it wasn’t impossible to overcome that.

No, instead, it’s that I stopped caring. I stopped caring about my grades. I stopped caring about what I was learning. I just didn’t care.


For the most part, I’ve found my subjects to be quite similar to subjects at school. You put in enough effort, and you do well. The material you learn is sometimes interesting, but mostly you just learn it to learn it.

But I have noticed one important thing. The subjects that I care about the most, the subjects that I enjoy, and consequently, the subjects that I do best in are the computing systems subjects.

For me, the logical puzzles and strange syntax just click. When given problems to solve, it’s intuitive for me to look at them and immediately have the outline of a solution in my head. I can see the solution to problems before other people have worked out where to start.

I look at a problem and my immediate thought is to work out how to solve it. I love the challenge presented, and I love making things that solve it.

And yet, I continue studying the other subjects in the vain hope that I’ll learn to enjoy them just as much. Someday, I think to myself, it will start being enjoyable.


I’ve changed immensely in the past two years.

After leaving home, mainly for practical reasons, I’ve become a different person entirely. Although I love my parents immensely, I could never really become an adult until I’d moved out. I didn’t know this until after the fact, of course.

However, there was one significant part of me that didn’t change: my plan in life. Up until I left home, I’d stayed the course. I’d moved from school into university without a second thought, just because it never really occurred to me to do otherwise. I hadn’t really considered my choice, if you could even call it that.

But as I grew as a person, I realised that I needed to reevaluate. While continuing on the path had a familiarity to it, I couldn’t ignore the other possibilities staring me in the face.


I’ve always said to myself that I’d rather do something I loved and earn a pittance than do something I hated and be rich.

However, I don’t think I’d ever really thought about just what that means. It was a set of empty words to me, not something I truly lived my life by.

I think it’s time that I stopped repeating hollow phrases to myself and actually did something about it.


I’m dropping out of university to follow my passion.

It’s a decision that I should have made a long time ago, and I regret not making it earlier.

I still have concerns that I’ll end up hating what I do, but change is something I have to accept and deal with if it happens.

Maybe this is change for the worse, and I end up deciding this isn’t what I want to do. I’m okay with that now, because at least I will have tried.

But maybe, just maybe, this is the best decision I’ll make in a long time.


I’m now taking serious offers for full-time work. If you’re hiring a WordPress developer, or know someone who is, contact me at r@rotorised.com

You’re Using Transients Wrong

The Transients API is an incredibly useful API in WordPress, and unfortunately
one of the most misunderstood. I’ve seen almost everyone misunderstand what
transients are used for and how they work, so I’m here to dispel those myths.

For those not familiar with transients, here’s a quick rundown of how they work.
You use a very simple API in WordPress that acts basically as a key-value store,
with an expiration. After the expiration time, the entry will be invalidated and
the transient will be deleted from the transient store. Transients essentially
operate the same as options, but with an additional expiration field.

By default in WordPress, transients are actually powered by the same backend as
options. Internally, when you set a transient (say foo), it gets transparently
translated to two options: one for the transient data (_transient_foo) and an
additional one for the expiration (_transient_timeout_foo). Once requested,
this will then be stored in the internal object cache and subsequent accesses in
the same request will reuse the value, in much the same way options are cached.
One of the most powerful parts of the transient API is that it uses the object
cache, allowing a full key-value store to be used in the backend. However the
default implementation, and how the object cache can change this, is where two
major incorrect assumptions come from.

Object Caching and the Database

The first incorrect assumption that developers make is to assume the database
will always be the canonical store of transient data. One big issue here is
attempting to directly manipulate transient data via the option API; after all,
transients are just a special type of option, right?

In the real world however, anything past your basic site will use an object
cache backend. Popular choices here include APC (including the new APCu) and
Memcache, which both cache objects in memory, not the database. With these
backends, using the option API will return invalid or no data, as the data
is never stored in the database.1

I’ve seen this used in real world plugins to determine if a transient is about
to expire by directly reading _transient_timeout_foo. This will break and
cause the transient to always be counted as expired with a non-default cache.
Before you think about how to do this in a cross-backend compatible way: you
can’t. Some backends simply can’t do this, and until WordPress decides to
provide an API for this, you can’t predict internal behaviour of the backends.

Expiration

The second incorrect assumption that most developers make is that the expiration
date is when the transient will expire. In fact, the inline documentation even
states that the parameter specifies the “time until expiration in seconds”.
This assumption is correct for the built-in data store: WordPress only
invalidates transients when attempting to read them (which has lead to
garbage collection problems in the past). However, this is not guaranteed
for other backends.

As I noted previously, transients use the object cache for non-default
implementations. The really important part to note here is that the object cache
is a cache, and absolutely not a data store. What this means is that the
expiration is a maximum age, not a minimum or set point.

One place this can happen easily is with Memcache set in Least Recently Used
(LRU) mode. In this mode, Memcache will automatically discard entries that
haven’t been accessed recently when it needs room for new entries. This means
less frequently accessed data (such as that used by cron data) can be discarded
before it expires.

What the transient API does guarantee is that the data will not exist past the
expiration time. If I set a transient to expire in 24 hours, and then attempt to
access it in 25 hours time, I know that it will have expired. On the other hand,
I could access it in 5 seconds in a different request and find that it has
already expired.

Real world issues are common with the misunderstanding of expiration times. For
WordPress 3.7, it was proposed to wipe all transients on upgrade for
performance reasons. Although this eventually was changed to just expired
transients, it revealed that many developers expect that data will exist until
the expiration. As a concrete example of this,
WooCommerce Subscriptions originally used transients for
payment-related locking. Eventually, Brent (the lead developer) found that these
locks were being silently dropped and users could in fact be double-billed in
some cases. This is not a theoretical issue, but a real-world instance of the
expiration age issue. The solution to this particular issue was to swap it out
for options, which are guaranteed to not be dropped.

When Should I Use Transients?

“This all sounds pretty doom and gloom, Ryan, but surely transients have a valid
use?”, you say. Correct, astute reader, they’re a powerful tool in the right
circumstances and a much simpler API than others.

Transients are perfect for caching any sort of data that should be persisted
across requests. By default, WordPress’ built-in object cache uses global state
in the request to cache any data, making it useless for caching persistent data.
Transients fill the gap here, by using the object cache if available and falling
back to database storage if you have a non-persistent cache.

One application of this persistence caching that fits perfectly is fragment
caching. Fragment caching applies full page caching techniques (like object
caching) to individual components of your page, such as a sidebar or a specific
post’s content. Mark Jaquith’s popular implemention previously
eschewed transients due to the lack of garbage collection combined with
key-based caching, however this is not
a concern with the upcoming WordPress 3.7.

Another useful application of transient storage is for caching long-running
tasks. Tasks like update checking involve remote server calls, which can be
costly both in terms of time and bandwidth, so caching these makes sense.
WordPress internally caches the result from update checking, ensuring that
excess calls to the internal update check procedures don’t cause excessive load
on the WordPress.org server. While the object caching API would work here, the
default implementation would never cache the result persistently.

Summary

Transients are awesome, but there are some important things to watch out for:

  • Transients are a type of cache, not data storage
  • Transients aren’t always stored in the database, nor as options
  • Transients have a maximum age, not a guaranteed expiration
  • Transients can disappear at any time, and you cannot predict when this will
    occur

Now, go out and start caching your transient data!

  1. The reason I say invalid or no data here is
    because it’s possible for a transient to be stored in the database before
    enabling an object cache, so that would be read directly. []

Requests for PHP: Version 1.6

It’s been a while since I released Requests 1.5 two years ago (!), and I’m
trying to get back on top of managing all my projects. The code in Requests has
been sitting there working perfectly for a long time, so it’s about time to
release to a new version.

Announcing Requests 1.6! This release brings a chunk of changes,
including:

  • Multiple request support – Send multiple HTTP requests with both
    fsockopen and cURL, transparently falling back to synchronous when
    not supported. Simply call Requests::request_multiple(), and servers with
    cURL installed will automatically upgrade to parallel requests.

  • Proxy support – HTTP proxies are now natively supported via a
    high-level API. Major props to Ozh for his fantastic work
    on this.

  • Verify host name for SSL requests – Requests is now the first and
    only PHP standalone HTTP library to fully verify SSL hostnames even with
    socket connections. This includes both SNI support and common name checking.

  • Cookie and session support – Adds built-in support for cookies
    (built entirely as a high-level API). To compliment cookies,
    sessions can be created with a base URL and default
    options, plus a shared cookie jar.

  • Opt-in exceptions on errors: You can now call $response->throw_for_status()
    and a Requests_Exception_HTTP exception will be thrown for non-2xx
    status codes.

  • PUT, DELETE, and PATCH requests are now all fully supported
    and fully tested.

  • Add Composer support – You can now install Requests via the
    rmccue/requests package on Composer

So, how do you upgrade? If you’re using Composer, you can bump your minimum
version to 1.6 and then update. (Note that you should remove minimum-stability
if you previously set it for Requests.) Otherwise, you can drop the new version
in over the top and it will work out of the box. (Version 1.6 is completely
backwards compatible with 1.5.)

What about installing for the first time? Just add this to your composer.json:

{
    "require": {
        "rmccue/requests": ">=1.6"
    }
}

Alternatively, you can now install via PEAR:

$ pear channel-discover pear.ryanmccue.info
$ pear install rmccue/Requests

Alternatively, head along to the
release page and
download a zip or tarball directly.

Along with 1.6, I’ve also created a
fancy new site, now powered by Jekyll. This
is hopefully a nicer place to read the documentation than on GitHub itself, and
should be especially handy to new users.

This release includes a lot of new changes, as is expected for such a long
release cycle (although hopefully a little shorter next time). One of the big
ones here is the significantly improved SSL support, which should guarantee
completely secure connections on all platforms. This involved a lot of learning
about how the internals of SSL certificates work, along with working with
OpenSSL. Getting them working in a compatible manner was also not particularly
easy; I spent about an hour tracking back through PHP’s source code to ensure
that stream_socket_client had the same level of availability as fsockopen
(it does) all the way back to PHP 5.2.0 (it did).

In all, 19 fantastic third-party contributors helped out with this release, and
I’d like to acknowledge those people here:

Feedback on this release would be much appreciated, as always. I look forward
to hearing from you and working with you to improve Requests even further!

The Next Stage of WP API

As you may have seen, my Summer of Code project is now over with the release of version 0.6. It’s been a fun time developing it, and an exceptionally stressful time as I tried to balance it with uni work, but worth it nonetheless. Fear not however, as I plan to continue working on the project moving forward. A team has been assembled and we’re about to start work on the API, now in the form of a Feature as a Plugin. To power the team discussions, we’ve also been given access to Automattic’s new o2 project (which I believe is the first public installation).

Throughout the project, I’ve been trying to break new ground in terms of development processes. Although MP6 was the first Feature as a Plugin (and the inspiration for the API’s development style), the API is the first major piece of functionality developed this way and both MP6 and the API will shape how we consider and develop Features as Plugins in the future. However, while MP6 has been developed using the FP model, the development process itself has been less than open, with a more dictatorial style of project management. This works for a design project where a tight level of control needs to be kept, but is less than ideal for larger development projects.

I’ve been critical, both publicly and privately, of some of WordPress’ development processes in the past; in particular, the original form of team-based development was in my opinion completely broken. Joining an existing team was near impossible, starting new discussion within the team is hard, and meetings are inevitably tailored to the team lead’s timezone. The Make blogs also fill an important role as a place for teams to organise, but are more focused towards summarising team efforts and planning future efforts than for the discussion itself.

At the other end of the spectrum is Trac, which is mainly for discussing specifics. For larger, more conceptual discussions, developers are told to avoid Trac and use a more appropriate medium. This usually comes in the form of “there’s a P2 post coming soon, comment there” which is not a great way to hold discussion; it means waiting for a core developer to start the discussion, and your topic might not be at the top of their mind. In addition, Make blogs aren’t really for facilitating discussion, but are more of a announcement blog with incidental discussion.

Since the first iteration of teams, we’ve gotten better at organisation, but I think there’s more we can do.

This is where our team o2 installation comes in. I’ve been very careful to not refer to it as a blog, because it’s not intended as such. Instead, the aim is to bring discussions back from live IRC chats to a semi-live discussion area. Think of it as a middle ground between live dialogue on IRC and weekly updates on a make blog. The idea is that we’ll make frequent posts for both planning and specifics of the project, and hold discussion there rather than on IRC. It’s intended to be a relatively fast-moving site, unlike the existing Make blogs. In addition, o2 should be able to streamline the discussion thanks to the live comment reloading and fluid interface.

Understandably for an experiment like this, there are many questions about how it will work. Some of the questions that have been asked are:

  • Why is this necessary? As I mentioned above, I believe this fits a middle ground between live discussion and weekly updates. The hope is for this to make it easier for everyone to participate.
  • Why isn’t this a Make blog? The Make blogs are great for longer news about projects, but not really for the discussion itself. They’re relatively low traffic blogs for long term planning and discussion rather than places where specifics can be discussed.
  • Why is it hosted on WordPress.com rather than Make.WordPress.org? Two main reasons: I wanted to try o2 for this form of discussion; and there’s a certain level of bureaucracy to deal with for Make, whereas setting up a new blog on WP.com was basically instant. The plan is to migrate this to Make if the experiment works, of course.
  • If you want to increase participation, why is discussion closed to the team only? Having closed discussion is a temporary measure while the team is getting up to speed and we work out exactly how this experiment will work. Comments will be opened to all after this initial period.

Fingers crossed, this works. We’re off to somewhat of a slow start at the moment, which is to be expected with starting up a large team from scratch on what is essentially an existing project. There’s a lot of work to do here, and we’ve got to keep cracking at the project to keep the momentum going. Fingers crossed, we can start building up steam and forge a new form of organisation for the projects.

A Vagrant and the Puppet Master: Part 2

Having a development environment setup with a proper provisioning tool is
crucial to improving your workflow. Once you’ve got your virtual machine set
up
and ready to
go, you need to have some way of ensuring that it’s set up with the software
you need.

(If you’d like, you can go and clone the
companion repository and
play along as we go.)

For this, my tool of choice is Puppet. Puppet is a bit different from other
provisioning systems in that it’s declarative rather than imperative. What do I
mean by that?

Declarative vs Imperative

Let’s say you’re writing your own provisioning tool from scratch. Most likely,
you’re going to be installing packages such as nginx. With your own provisioning
tool, you might just run apt-get (or your package manager of choice) to
install it:

apt-get install nginx

But wait, you don’t want to run this if you’ve already got it set up, so you’re
going to need to check that it’s not already installed, and upgrade it instead
if so.

if $( which nginx ) then
    apt-get install nginx
else
    apt-get update nginx
end

This is relatively easy for basic things like this, but for more complicated
tools, you may have to work this all out yourself.

This is an example of an imperative tool. You say what you want done, and the
tool goes and does it for you. There is a problem though: to be thorough, you
also need to check that it has actually been done.

However, with a declarative tool like Puppet, you simply say how you want your
system to look, and Puppet will work out what to do, and how to transition
between states. This means that you can avoid a lot of boilerplate and checking,
and instead Puppet can work it all out for you.

For the above example, we’d instead have something like the following:

package {'nginx':
    ensure => latest
}

This says to Puppet: make sure the nginx package is installed and up-to-date. It
knows how to handle any transitions between states rather than requiring you to
work this out. I personally prefer Puppet because it makes sense to me to
describe how your system should look rather than writing separate
installation/upgrading/etc routines.

(To WordPress plugin developers, this is also the same approach that WordPress
takes internally with database schema changes. It specifies what the database
should look like, and dbDelta() takes care of transitions.)

Getting It Working

So, now that we know what Puppet is going to give us, how do we get it set up?
Usually, you’d have to go and ensure that you install Puppet on your machine,
but thankfully, Vagrant makes it easy for us. Simply set your provisioning tool
to Puppet and point it at your main manifest file:

config.provision :puppet => {
    puppet.manifests_path = "manifests"
    puppet.manifest_file  = "site.pp"
    puppet.module_path    = "modules"
    #puppet.options        = '--verbose --debug'
}

What exactly is a manifest? A manifest is a file that tells Puppet what you’d
like your system to look like. Puppet also has a feature called modules that add
functionality for your manifests to use, and I’ll touch on that in a bit, but
just trust this configuration for now.

I’m going to assume you’re using WordPress with nginx and PHP-FPM. These
concepts are applicable to everyone, so if you’re not, just follow along
for now.

First off, we need to install the nginx and php5-fpm packages. The following
should be placed into manifests/site.pp:

package {'nginx':
    ensure => latest
}
package {'php5-fpm':
    ensure => latest
}

Each of these declarations is called a resource. Resources are the basic
building block of everything in Puppet, and they declare the state of a certain
object. In this case, we’ve declared that we want the state of the nginx and
php5-fpm packages to be ‘latest’ (that is, installed and up-to-date).

The part before the braces is called the “type”. There are a huge number of
built-in types
in
Puppet and we’ll also add some of our own later. The first part inside the
braces is called the namevar and must be unique with the type; that is, you can
only have one package {'nginx': } in your entire project. The part after the
colon is called the attributes of the resource.

Next up, let’s set up your MySQL database. Setting up MySQL is a slightly more
complicated task, since it involves many steps (installing, setting
configuration, importing schemas, etc), so we’ll want to use a module instead.

Modules are reusable pieces for manifests. They’re more powerful than normal
manifests, as they can include custom Ruby code that interacts with Puppet, as
well as powerful templates. These can be complicated to create, but they’re
super simple to use.

Puppet Labs (the people behind Puppet itself) publish the canonical MySQL
module
, which is what we’ll be
working with here. We’ll want to clone this into our modules directory, which we
set previously in our Vagrantfile.

$ mkdir modules
$ cd modules
$ git clone git@github.com:puppetlabs/puppetlabs-mysql.git mysql

Now, to use the module, we can go ahead and use the class. I personally don’t
care about the client, so we’ll just install the server:

class { 'mysql::server':
    config_hash => { 'root_password' => 'password' }
}

(You’ll obviously want to change ‘password’ here to something slightly
more secure.)

MySQL isn’t much use to us without the PHP extensions, so we’ll go ahead and get
those as well.

class { 'mysql::php':
    require => Package['php5-fpm'],
}

Notice there’s a new parameter we’re using here, called require. This tells
Puppet that we’re going to need PHP installed first. Why do we need to do this?

Rearranging Puppets

Puppet is a big fan of being as efficient as possible. For example, while we’re
working on installing MySQL, we can go and start setting up our
nginx configuration.

To solve this, Puppet has the concept of dependencies. If any step depends on a
previous one, you have to specify this dependency explicitly1. Puppet
splits running into two parts: first, it does compilation of the resources to
work out your dependencies, then it executes the resources in the order
you’ve specified.

There are two ways of doing this in Puppet: you can specify require or
before on individual resources, or you can specify the dependencies all
at once.

# Individual style
class { 'mysql::php':
    require => Package['php5-fpm'],
}

# Waterfall style
Package['php5-fpm'] -> Class['mysql::php']

I personally find that the require style is nicer to maintain, since you can
see at a glance what each resource depends on. I avoid before for the same
reason, but these are stylistic choices and it’s entirely up to you as to which
you use.

You may have noticed a small subtlety here: the dependencies use a different
cased version of the original, with the namevar in square brackets. For example,
if I declare package {'nginx': }, I refer to this later as Package['nginx'].
This is a somewhat strange thing to get used to when starting out, but you’ll
quickly get used to it.

(We’ll get to namespaced resources soon such as mysql::db {'mydb': }, and the
same rule applies here to each part of the name, so this would become
Mysql::Db['mydb'].)

Important note: It’s important not to declare your resources with capitals,
as this actually sets the default attributes. Avoid this unless you’re sure you
know what you’re doing.

Setting Up Our Configuration

We’ve now got nginx, PHP, MySQL and the MySQL extensions installed, so we’re now
ready to start configuring it for our liking. Now would be a great time to try
vagrant up and watch Puppet run for the first time!

Let’s now go and set up both our server directories and the nginx configuration
for them. We’ll use the file type for both of these.

file { '/var/www/vagrant.local':
    ensure => directory
}
file { '/etc/nginx/sites-available/vagrant.local':
    source => "file:///vagrant/vagrant.local.nginx.conf"
}
file { '/etc/nginx/sites-enabled/vagrant.local':
    ensure => link,
    target => '/etc/nginx/sites-available/vagrant.local'
}

And the nginx configuration for reference, which should be saved to
vagrant.local.nginx.conf next to your Vagrantfile:

server {
    listen 80;
    server_name vagrant.local;
    root /var/www/vagrant.local;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ .php {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }
}

(This is not the best way to do this in Puppet, but we’ll come back to that.)

Next up, let’s configure MySQL. There’s a mysql::db type provided by the MySQL
module we’re using, so we’ll use that. This works the same way as the file and
package types that we’ve already used, but obviously takes some different
parameters:

mysql::db {'wordpress':
    user     => 'root',
    password => 'password',
    host     => 'localhost',
    grant    => ['all'],
    require  => Class['mysql::server']
}

Let’s Talk About Types, Baby

You’ll notice that we’ve used two different syntaxes above for the MySQL parts:

class {'mysql::php': }
mysql::db {'wordpress': }

The differences here are in how these are defined in the module: mysql::php is
defined as a class, whereas mysql::db is a type. These reflect fundamental
differences in what you’re dealing with behind the resource. Things that you
have one of, like system-wide packages, are defined as classes. There’s only one
of these per-system; you can only really install MySQL’s PHP bindings once.2

On the other hand, types can be reused for many resources. You can have more
than one database, so this is set up as a reusable type. The same is true for
nginx sites, WordPress installations, and so on.

You’ll use both classes and types all the time, so understanding when each is
used is key.

Moving to Modules

nginx and MySQL are both set up with our settings now, but it’s not really in a
very reusable pattern yet. Our nginx configuration is completely hardcoded for
the site, which means we can’t duplicate this if we want to set up another site
(for example, a staging subdomain).

We’ve used the MySQL module already, but all of our resources are in our
manifests directory at the moment. The manifests directory is more for the
specific machine you’re working on, whereas the modules directory is where our
reusable components should live.

So how do we create a module? First up, we’ll need the right structure. Modules
are essentially self-contained reusable parts, so there’s a certain structure
we use:

  • modules/<name>/ – The module’s full directory
    • modules/<name>/manifests/ – Manifests for the module, basically the same
      as your normal manifests directory
    • modules/<name>/templates/ – Templates for the module, written in Erb
    • modules/<name>/lib/ – Ruby code to provide functionality for your
      manifests

(I’m going to use ‘myproject’ as the module’s name here, but replace that with
your own!)

First up, we’ll create our first module manifest. For this first one, we’ll use
the special filename init.pp in the manifests directory. Before, we used
the names mysql::php and mysql::db, but the MySQL module also supplies a
mysql type. Puppet maps a::b to modules/a/manifests/b.pp, but a class
called a maps to modules/a/manifests/init.pp.

Here’s what our init.pp should look like:

class myproject {
    if ! defined(Package['nginx']) {
        package {'nginx':
            ensure => latest
        }
    }
    if ! defined(Package['php5-fpm']) {
        package {'php5-fpm':
            ensure => latest
        }
    }
}

(We’ve wrapped these in defined() calls. It’s important to note that while
Puppet is declarative, this is a compile-time check. If you’re making
redistributable modules, you’ll always want to use this, as you can’t declare
types twice, and users should be able to redefine these in their manifests.)

Next, we want to set up a reusable type for our site-specific resources. To do
this in a reusable way, we also need to take in some parameters. There’s one
special variable passed in automatically, the $title variable, which
represents the namevar. Try to avoid using this directly, but you can use this
as a default for your other variables.

Declaring the type looks the same as defining a function in most languages.
We’ll also update some of our definitions from before.3

define myproject::site (
    $name = $title,
    $location,
    $database = 'wordpress',
    $database_user = 'root',
    $database_password = 'password',
    $database_host = 'localhost'
) {
    file { $location:
        ensure => directory
    }
    file { "/etc/nginx/sites-available/$name":
        source => "file:///vagrant/vagrant.local.nginx.conf"
    }
    file { "/etc/nginx/sites-enabled/$name":
        ensure => link,
        target => "/etc/nginx/sites-available/$name"
    }

    mysql::db {$database:
        user     => $database_user,
        password => $database_password,
        host     => $database_host,
        grant    => ['all'],
    }
}

(This should live in modules/myproject/manifests/site.pp)

Now that we have the module set up, let’s go back to our manifest for Vagrant
(manifests/site.pp). We’re going to completely replace this now with
the following:

# Although this is declared in myproject, we can declare it here as well for
# clarity with dependencies
package {'php5-fpm':
    ensure => latest
}
class { 'mysql::php':
    require => [ Class['mysql::server'], Package['php5-fpm'] ],
}
class { 'mysql::server':
    config_hash => { 'root_password' => 'password' }
}

class {'myproject': }
myproject::site {'vagrant.local':
    location => '/var/www/vagrant.local',
    require  => [ Class['mysql::server'], Package['php5-fpm'], Class['mysql::php'] ]
}

Note that we still have the MySQL server setup in the Vagrant manifest, as we
might want to split the database off onto a separate server. It’s up to you to
decide how modular you want to be about this.

There’s one problem still in our site definition: we still have a hardcoded
source for our nginx configuration. Puppet offers a great solution to this in
the form of templates. Instead of pointing the file to a source, we can bring
in a template and substitute variables.

Puppet gives us the template() function to do just that, and automatically
supplies all the variables in scope to be replaced. There’s a great
guide
and
tutorial that explain this
further, but most of it is self-evident. The main thing to note is that
template() function’s template location is in the form <module>/<filename>,
which maps to modules/<module>/templates/<filename>.

Our file resource should now look like this instead:

file { "/etc/nginx/sites-available/$name":
    content => template('myproject/site.nginx.conf.erb')
}

Now, we’ll create our template. Note the lack of hardcoded pieces.

server {
    listen 80;
    server_name <%= name %>;
    root <%= location %>;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ .php {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }
}

(This should be saved to modules/myproject/templates/site.nginx.conf.erb)

Our configuration will now be automatically generated, and the name and location
will be imported from the parameters to the typedef.

If you’d really like to go crazy with this, you can basically parameterise
everything you want to change. Here’s an example from one of mine:

server {
    listen <%= listen %>;
    server_name <% real_server_name.each do |s_n| -%><%= s_n %> <% end -%>;
    access_log <%= real_access_log %>;
    root <%= root %>;

<% if listen == '443' %>
    ssl on;
    ssl_certificate <%= real_ssl_certificate %>;
    ssl_certificate_key <%= real_ssl_certificate_key %>;

    ssl_session_timeout <%= ssl_session_timeout %>;

    ssl_protocols SSLv2 SSLv3 TLSv1;
    ssl_ciphers ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
    ssl_prefer_server_ciphers on;
<% end -%>

<% if $front_controller %>
    location / {
        fastcgi_param SCRIPT_FILENAME $document_root/<%= front_controller %>;
<% else %>
    location / {
        try_files $uri $uri/ /index.php?$args;
        index <%= index %>;
    }

    location ~ .php$ {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+.php)(/.+)$;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
<% end -%>
        fastcgi_pass <%= fastcgi_pass %>;
        fastcgi_index index.php;
        include /etc/nginx/fastcgi_params;
    }

    location ~ /.ht {
        deny all;
    }

<% if $builds %>
    location /static/builds/ {
        internal;
        alias <%= root %>/data/builds/;
    }
<% end -%>

<% if include  != '' %>
    <%include.each do |inc| %>
        include <%= inc %>;
    <% end -%>
<% end -%>
}

Notifications!

There’s one small problem with our nginx setup. At the moment, our sites won’t
be loaded in by nginx until the next manual restart/reload. Instead, what we
need is a way to tell nginx that we need to reload when the files are updated.

To do this, we’ll first define the nginx service in our init.pp manifest.

service { 'nginx':
    ensure     => running,
    enable     => true,
    hasrestart => true,
    restart    => '/etc/init.d/nginx reload',
    require    => Package['nginx']
}

Now, we’ll tell our site type to send a notification to the service when we
should reload. We use the notify metaparameter here, and we’ve already set the
service up above to recognise that as a “reload” command.

file { "/etc/nginx/sites-available/$name":
    content => template('myproject/site.nginx.conf.erb'),
    notify => Service['nginx']
}
file { "/etc/nginx/sites-enabled/$name":
    ensure => link,
    target => "/etc/nginx/sites-available/$name",
    notify => Service['nginx']
}

nginx will now be notified that it needs to reload when we both create/update
the config, as well as when we actually enable it.

(We need it on the config proper in case we update the configuration in the
future, since the symlink won’t change in that case. The notification relates
specifically to the resource, even if said resource is the link itself.)

We should now have a full installation set up and ready to serve from your
Vagrant install. If you haven’t already, boot up your virtual machine:

$ vagrant up

If you change your Puppet manifests, you should reprovision:

$ vagrant provision

Machine vs Application Deployment

There can be a bit of a misunderstanding as to what should be in your Puppet
manifests. This is something that can be a bit confusing, and I must admit that
I was originally confused as well.

Puppet’s main job is to control machine deployment. This includes things like
installing software, setting up configuration, etc. There’s also the separate
issue of application deployment. Application deployment is all about deploying
new versions of your code.

The part where these two can get conflated is installing your application and
configuring it. For WordPress, you usually want to ensure that WordPress itself
is installed. This is something that is probably outside of your application,
since it’s fairly standard, and it only happens once. You should use Puppet here
for the database configuration, since it knows about the system-wide
configuration which is specific to the machine, not the application.

You probably also want to ensure that certain plugins and themes are enabled.
This is something that should not be handled in Puppet, since it’s part of
your application’s configuration. Instead, you should create a must-use plugin
that ensures these are set up correctly. This ensures that if your app is
updated and rolled out, you don’t have to use Puppet to reprovision your server.

(If you do push this into your Puppet configuration, bear in mind that updating
your application will now involve both deploying the code and reprovisioning the
server.)

Wrapping Up

If you’d like, you can now go and clone the
companion repository and
try running it to test it out.

Hopefully by now you should have a good understanding both of Vagrant and
Puppet. It’s time to start applying these tools to your workflow and adjusting
them to how you want to use them. Keep in mind that rules are made to be broken,
so you don’t have to follow my advice to the letter. Experiment, and have fun!

  1. There are a few
    cases where this doesn’t apply, but you should be explicit anyway. For example,
    files will autodepend on their directory’s resource if it exists. []
  2. Yes,
    I realise you can do per-user installation, but a) that’s an insane setup; and
    b) you’ll need to handle package management yourself this way. []
  3. This previously used
    hardcoded database credentials. Thanks to James Collins for catching this! []