Pre-Thought Listen – The personal weblog of Ryan McCue.

On Contribution

20th of December, 2024(Updated 20th of December, 2024)

Every day, I open my computer in to Slack. The first thing I see, every day, is the WordPress Slack icon reporting a problem.

Every day, I click the icon and log in through WordPress.org, to be told that my Slack account is still disabled.

Although I can no longer access Slack, I am still contributing as best as I can. I’m still working on GitHub issues, Trac tickets, comments on the Make P2s, and through chatting directly with other contributors.

I am trying as best as I can not to let it affect my work. Contributing without using Slack is difficult – back-and-forth over an issue that could be a fluid dialogue becomes long paragraphs over days instead. Attending meetings is impossible – I am reduced to reading the notes – and helping others is much more difficult than it should be.¹

But no matter what I do, it’s intensely demotivating to be reminded, every day, that I have been declared persona non grata.

I’ve still received no communication from anyone in the project as to why I can no longer access Slack or why I am blocked on Twitter.

I’ve spent more than 20 years – more than two-thirds of my life – contributing to WordPress. I have poured much of my life into this project, spent sleepless nights worrying about it, and dealt with the stress and burnout caused by the politics, personalities, and personal attacks.

Although I’ve been exiled by the project officially, I don’t feel like I’m “on the outside” because of the community. Many, many people have reached out, and I thank them for that.

Automattic’s response to the injunction this week says that they’re “continuing to protect the open source ecosystem”, but one thing is clear: WP Engine did not block me.

I like to think I can shrug all of this off, that I don’t really care that much, but I can’t. Because I do care. Because I believe – I still believe – that this thing matters.

I am trying the best that I can.

But, it’s hard to imagine wanting to continue to work on WordPress after this.

Edit: A few hours since publishing this, I have been blocked from WordPress.org, and hence from contributing on Trac as well.

I am lucky. My job does not depend on contributing. [↩]

A Stronger Foundation for the Ecosystem

1st of October, 2024(Updated 1st of October, 2024)

The feud between Automattic and WP Engine has continued, with WordPress.org blocking access by WP Engine’s servers.

In WP Engine Must Win, I wrote about my thoughts on the legal argument on this battle, and why it is important that WP Engine win the trademark case in order to protect the ecosystem. I also touched on the moral argument:

The case that companies should contribute certain amounts (for example, 5% of time or resources) is one that reasonable people can argue over and disagree about – and we see other cases of this across the open source community. Raising whether certain companies are meaningfully contributing is the sort of advocacy befitting the Foundation, whether you agree with the specifics or not.
However, we should not confuse this worthwhile advocacy with the stunning claims that Matt and Automattic are making, and the impact this would cause upon the industry and the project itself.

The confusion between these arguments has clouded much of this discussion, and I’ve had both public and private responses to my post which have expressed gratitude for helping to clarify these.¹ Richard Best’s excellent WP and Legal Stuff has also broken these arguments out, and I strongly encourage reading everything he has written on this topic.

Setting aside the legal argument, I want to address the broader points that Matt has made about the sustainability of the ecosystem, and about what companies should contribute.

I agree with Matt that for the strength of the ecosystem, we must defend the ideals of the project.

Protecting open source

Freedom Zero is the freedom to run the program for any purpose, and it is a foundational ideal of free software. The GPL license is clear that there is no obligation for users to contribute back to the software.

For a long time, this was enough. However, I have personally changed my view on these obligations over the years, and this latest case brings it further into focus.

It’s clear that we live in a world where open source and free software won. The challenges these licenses were created to face have been defeated. However, these licenses are ill-equipped to deal with the tragedy of the commons that is modern exploitation of open source.

Matt is right when he speaks about the ethos of open source being what makes it work, and we need to step up to reinforce this. There is a strong case to be made that bad actors are plundering the plentiful fields of open source and exploiting the spirit of the ecosystems, and if we do not act, the commons may crumble.

It’s also clear that we don’t have the right tools to deal with these problems right now. The WordPress trademark is being used in this legal battle since it is one of the only tools that’s available, but in doing so, has created negative consequences we now must all live with.

We need something better than this.

Empowering the WordPress Foundation

In order to meet these challenges, we need new tools.

The best way we can do this is to empower the WordPress Foundation, whose mission is:

To ensure free access, in perpetuity, to the software projects we support. People and businesses come and go, so it is important to ensure that the source code for these projects will survive beyond the current contributor base, that we may create a stable platform for web publishing for generations to come.

Currently, the Foundation is underequipped to achieve this mission, and does so only indirectly.

WordPress needs a strong Foundation to ensure its longevity into the future, one which is capable of fighting for the spirit of the ecosystem.

We can take inspiration from other open source ecosystems about this, including from the Drupal Association, the Python Software Foundation, and the Linux Foundation. We can follow their models by empowering the WordPress Foundation in three key areas.

The WordPress Foundation must be enabled and responsible

The Foundation needs to be stronger than it is today, and enabled to achieve its goals. It must also be trusted by the ecosystem in its role.

Currently, the Foundation plays a minor role in the operations and financial backing of the project. Its primary roles traditionally have been stewardship of the trademark, and operation of WordCamps – the latter of which is now run by a subsidiary public benefit corporation.

The largest costs for community services – employing contributors and running WordPress.org – are borne primarily by generous contributions directly from Matt and Automattic, along with many other contributors from other companies to the project. Consequently, there’s little perceived benefit to direct donations to the Foundation, and burden continues to fall to Matt and Automattic. These donations are truly commendable², but we need to build a system that does not rely on this alone.

By acting transparently and being more active, the Foundation could build trust and earn the ability to solicit more support.

A clear start to achieve this is to empower the Foundation with a steering committee or board comprised of active community members, which can join Matt in actively driving the mission. At a minimum, this committee should be responsible for the Foundation’s use of trademarks and its policies – it may also make sense for it to have a say in the project’s direction and roadmap, as in Joost’s proposal.³

The Foundation could take a further step towards ensuring the continuity of the project by directly employing key contributors, following the model of the Linux Foundation, which employs key contributors like Linus Torvalds, Greg Kroah-Hartman, and Shuah Khan.⁴

With trust built in the Foundation, it could solicit memberships/sponsorships more strongly from companies, following the model of many other successful foundations, letting those companies benefit from the goodwill it creates (as Automattic does with their donations). In doing so, it could become financially independent, enabling WordPress to truly survive in perpetuity.

This ensures both that the Foundation can meet its goals of ensuring access even as people and businesses come and go, and also ensures the Foundation itself survives any changes – creating a virtuitous cycle.

The WordPress Foundation must be clear

To ensure it can be trusted by the community, the Foundation needs to be clear on its policies, especially on the trademark policy and on community involvement.

Until the talk at WordCamp US, it was widely understood that the Five for the Future program was a suggested, voluntary program, encouraging companies to contribute 5% of their time or resources to the project. However, it is clear from Automattic’s actions that some level of contribution is now a requirement in the community.

The Foundation (not Automattic) should publish clear guidance about the expectations on the ecosystem. These expectations should be published in a contribution agreement, which should be enforced (more on that in a moment) via contractual obligations – rather than by tenuous trademark enforcement.

If policies change, or a suggestion moves to a requirement, this must be clearly, openly communicated with appropriate timelines, and not retrospective. This ensures the Foundation can be trusted, and allows the ecosystem to act confidently, avoiding the chilling effects of uncertainty.

The WordPress Foundation must have teeth

In order to place pressure on the ecosystem to act well, the Foundation must have teeth. It must be prepared and equipped to vigorously defend the community and ecosystem.

WordCamps and community events are a vital part of the ecosystem, and many companies derive value from both sponsorships and attendance. In the same way that a code of conduct for individuals is enforced, the Foundation should be unafraid to require the contribution agreement for participation.

The central services provided by WordPress.org should also be part of the Foundation’s tools. Whilst the implementation and communication around the block on WP Engine leaved much to desire, the sentiment of companies exploiting a free service is right, and the Foundation should be equipped to use it. This includes limiting the use of WordPress.org’s APIs as well as listing in the WordPress.org directory.

In order to enable the Foundation to use it as a tool, WordPress.org must be under the Foundation’s direct control.

In addition to these tools, the Foundation also controls the trademark. While I believe the specific case against WP Engine is overextended and dangerous to the community, the Foundation should defend its trademark in legitimate cases involving market confusion.

This includes enforcement of any licensed usage of the trademark. It is clear that the primary confusion is between Automattic’s WordPress.com product and the WordPress open source project – so much so that Automattic itself has to clarify to consumers. The Foundation can continue to act as a guard against intentional confusion and check that its licensees correct and clarify these cases.⁵

Moving forward

Putting the Foundation at the heart of defending the ethos and freedoms gives the whole community the ability to work together.

Matt put it best:

I believe that software, and in fact entire companies, should be run in a way that assumes that the sum of the talent of people outside your walls is greater than the sum of the few you have inside. None of us are as smart as all of us. Given the right environment — one that leverages the marginal cost of distributing software and ideas — independent actors can work toward something that benefits them, while also increasing the capability of the entire community.

Matt and Automattic have given immense amounts to the project, and a stronger Foundation gives us all the capability to share the burden. It provides a path forward towards a community that is sustainable in the long term, which encourages and creates good actors, and guides others to the right path.

This fight is bigger than any two companies going head to head, and this specific legal battle obscures the big picture of what we need to achieve.

WP Engine must win the trademark battle, but the open source ecosystem must win this war.

To be clear, I have not and will not be speaking privately to anyone directly involved in the legal dispute. Apologies if I have not replied to your messages, but I welcome public replies to anything you disagree with. [↩]
I have worked on or with WordPress for more than 20 years, and not once have I doubted Matt’s belief in and commitment to open source. [↩]
This steering committee need not be the Foundation’s board directly. Joost would also make an excellent member of this committee. [↩]
Historically, Audrey fulfilled a similar role for WordPress, however this role has been absorbed into Automattic. [↩]
While I don’t think Matt or Automattic leadership wilfully confuse these, it is important that an independent group keeps this in check, especially as Automattic continues to grow. [↩]

WP Engine Must Win

26th of September, 2024(Updated 26th of September, 2024)

On stage at WordCamp US last week, Matt Mullenweg gave a keynote presentation which made a wide range of points about contribution, the ethics of open source, and the commitments various companies make to contributing. In particular, he called out WP Engine in what was a fairly clear direction to the community to stop using them. This was then followed with a post on WordPress.org.

Since then, further details have emerged about the conversations happening behind the scenes, as a result of WP Engine’s cease and desist, and Matt’s live Twitter Spaces (thanks to Courtney Robertson for her notes). It has also emerged that the WordPress Foundation has filed trademarks for “managed WordPress” and “hosted WordPress”.

In particular, the following details from WP Engine’s letter stand out:

Automattic CFO Mark Davies told a WP Engine board member that Automattic would “go to war” if WP Engine did not agree to pay its competitor Automattic a significant percentage of its gross revenues – tens of millions of dollars in fact – on an ongoing basis. Mr. Davies suggested the payment ostensibly would be for a “license” to use certain trademarks like WordPress, even though WP Engine needs no such license. WP Engine’s uses of those marks to describe its services – as all companies in this space do – are fair uses under settled trademark law and consistent with WordPress’ own guidelines.

These have been confirmed by Automattic’s counter letter, which also states Automattic is asking WP Engine to pay 8% of their revenue.

Until yesterday, the stated policy of the WordPress Foundation was:

All other WordPress-related businesses or projects can use the WordPress name and logo to refer to and explain their services

And:

The abbreviation “WP” is not covered by the WordPress trademarks and you are free to use it in any way you see fit.

(As of writing, the page was last updated at 2024-09-24T16:45:36; the prior version recorded by the Internet Archive was active at 2024-09-24T02:45:55.)

As WP Engine’s filing notes, it is long established trademark case law that trademarks may be used descriptively under fair use. In the phrases “hosted WordPress”, “headless WordPress, “WordPress platform” (etc), the term “WordPress” is clearly being used descriptively – it is website hosting for the WordPress open source software.

There are no other terms that can substitute, and a reasonable person who understands that WordPress is an open source, installable project can clearly make this distinction. In the same manner, seeing other hosts offering “Apache & PHP hosting” is clearly descriptive, and not an attempt to pass off as officially licensed products of the respective trademark holders. (“Managed WordPress”, a term the Foundation has now filed trademarks for, has been used by the community since at least 2010 since it was popularised by Pagely.)

The first statement may have been the WordPress Foundation’s policy, but it is also clearly explaining cases of fair use. The second statement is a matter of fact: “WP” is not trademarked.

The trademark policy now states (2024-09-24 22:21):

The abbreviation “WP” is not covered by the WordPress trademarks, but please don’t use it in a way that confuses people. For example, many people think WP Engine is “WordPress Engine” and officially associated with WordPress, which it’s not. They have never once even donated to the WordPress Foundation, despite making billions of revenue on top of WordPress.
If you would like to use the WordPress trademark commercially, please contact Automattic, they have the exclusive license. Their only sub-licensee is Newfold.

(This policy page was authored by Matt, who is also CEO of Automattic. Automattic and Newfold also have a business relationship beyond trademark licensing, with Bluehost Cloud using Automattic’s WP Cloud infrastructure-as-a-service product. Newfold is also an investor in Automattic.)

Across these conversations, there is a clear letter of intent from Matt, Automattic, and the WordPress Foundation: if you use the term “WordPress” commercially in any way, Automattic may dictate the terms under which you may use it.

If Automattic were to win this legal argument, this would mean it is no longer possible for “WordPress agencies” to use the term, nor for hosts to offer “WordPress hosting”, nor for “WordPress plugins” to be commercially available. These would, under their argument, not be fair use, but rather an attempt to pass off your products as officially sanctioned by the WordPress Foundation and Automattic.

The only way any of these businesses would be able to operate is under the terms that Automattic chooses – in WP Engine’s case, that was 8% of their revenue. Any company could be subject to a shakedown for an arbitrary amount, or face ruinous legal action and intimidation in the public space.

If Automattic had the right to dictate any use of the trademark, this would be a severe net-negative for the WordPress project, the WordPress Foundation, and for open source projects in general. It would severely encumber any company merely seeking to describe the products and servicesthey offer.

It would also have a chilling effect upon any commercial activity using WordPress, as any business could be targeted by Automattic for licensing fees, even those using the trademark descriptively, fairly, and in good faith.

This would directly work against the WordPress Foundation’s non-profit goal of serving the public good.

The case that companies should contribute certain amounts (for example, 5% of time or resources) is one that reasonable people can argue over and disagree about – and we see other cases of this across the open source community. Raising whether certain companies are meaningfully contributing is the sort of advocacy befitting the Foundation, whether you agree with the specifics or not.

However, we should not confuse this worthwhile advocacy with the stunning claims that Matt and Automattic are making, and the impact this would cause upon the industry and the project itself.

WP Engine must win this legal battle for the continued health and vibrancy of the WordPress project.

Seamless Webviews in Electron

28th of March, 2020

Electron has a few different ways to embed web content safely into an existing window. The standard technique is to use a regular iframe, but this doesn’t give you all the power you might need over user content; notably, you don’t get full control over the will-navigate event for user navigation.

The documentation notes that you can use WebViews or regular BrowserWindows. WebViews have the distinct advantage that they appear in the regular DOM, and hence are flowed as part of the document’s content. They’re also underdocumented and not entirely recommended, but they’re the best balance between an iframe and a BrowserWindow that Electron offers right now.

For my particular use case, I want to render user content with the following key requirements:

User content is supplied as a string of HTML.
User content should be restricted as much as possible. Notably, no user-supplied JavaScript should be run.
Navigation events must be caught and sent to the system; i.e. clicking a link in user content should open the system browser.
User content should be seamlessly displayed as part of the parent layout flow, without additional scrolling.

The solution I settled on for this was a webview with JavaScript disabled, encoding the content into a data URL:

<webview
    enableremotemodule="false"
    src="data:text/html,..."
    webpreferences="javascript=no"
/>

This works great and solves the first two items neatly. The third item is also easily solved by attaching events in the main process:

app.on( 'web-contents-created', function ( event, contents ) {
    if ( contents.getType() !== 'webview' ) {
        return;
    }

    // Handle browsing inside a webview.
    const handleLink = function ( event, url ) {
        event.preventDefault();
        shell.openExternal( url );
    };
    contents.on( 'will-navigate', handleLink );
    contents.on( 'new-window', handleLink );
} );

However, the fourth item is where it gets tricky. We need to detect the inherent size of the user content, and then change the height of the webview to match this. In a traditional iframe setting, something like iframe-resizer could be used, but we’ve disabled JavaScript. Additionally, we can’t access the contentDocument of the iframe (OOP iframes are always treated as cross-origin iframes, and there’s a shadow root involved too).

The naïve approach is to try using webview.executeJavaScript() directly, however this will give errors from Chrome, as we’ve just told it to disable JavaScript. Makes sense, but also annoying.

Preload scripts do run though. After experimentation with this however, it appears that messaging doesn’t fully work, and DOM events don’t persist after preloading. This means it’s functionally useless for this purpose.

After a tonne of experimentation, it turns out that using contents.executeJavaScriptInIsolatedWorld() works, provided you supply a non-zero world ID. This also has access to the DOM. While you can’t use IPC or messaging from this world, your script can return a value. So, to get the inherent size of the user content, we can run the following in the main process:

app.on( 'web-contents-created', function ( event, contents ) {
    // Calculate height in an isolated world.
    contents.executeJavaScriptInIsolatedWorld( 240, [
        {
            code: 'document.documentElement.scrollHeight',
        }
    ] ).then( function ( height ) {
        // We now have the height.
    } );
} );

As a slight wrinkle to this, you cannot access this from the renderer, only from the main process. Thankfully, you can access the webcontents ID from the renderer, so you can use IPC to link these two together:

// In main:
win.webContents.send( 'webview-height', {
    id: contents.id,
    height,
} );

// In renderer:
ipcRenderer.on( 'webview-height', function ( e, data ) {
    if ( data.id === webview.id ) {
        webview.style.height = data.height;
    }
} );

Your code in practice will likely need to be more complex than this to properly handle timing; you could also use ipcRenderer.invoke with webContents.fromId() to drive this from the renderer instead.

How WordPress Knows What Page You’re On

4th of January, 2019

In the spirit of Dan Abramov’s Overreacted blog, where he deep-dives into React on his personal blog, I thought I’d do the same for WordPress. If there’s something you’d like to see, let me know!

Since WordPress 1.0, WordPress has supported “pretty permalinks”; that is, human-readable permalinks. This system is built for a lot of flexibility, and allows users to customise the format to their liking, using “rewrite tags”.

Screenshot of the WordPress permalinks screen, showing the presets as well as custom input with "available tags" buttons

Pretty permalinks is implemented through the Rewrite system, but how that works can be a bit obscure, even if you’re familiar with it.

“Rewrites”, for those who aren’t familiar, are how WordPress maps a pretty permalink to something it can use internally. This internal representation is called a “query” (which is a bit of an overloaded term), and is eventually used to build a MySQL query which fetches the requested posts from the database.

This “query” is not exactly the same as what you might think of as a query in WordPress. It’s a mix of parameters used in WP_Query to get posts (called “query arguments” or “query args”) as well as information about the request itself (called “query variables” or “query vars”). Query vars are typically only useful for the main query, and include routing information like whether a 404 error occurred. This will hopefully be clearer later.

Let’s step through, chronologically, how WordPress handles turns your request into this query.

Aside: WP_Rewrite?

If you’re a seasoned WordPress developer, you might know Rewrites through the WP_Rewrite class. But perhaps surprisingly (or not, if you know how WordPress has evolved), rewrites are actually handled in the little-known WP class instead. Additionally, some (in fact, many) URLs and patterns are routed outside of regular rewrites.

We’re going to take a look at the whole process from where it starts, not just WP_Rewrite. The rewrite process really begins as soon as WordPress starts handling the request.

Bootstrapping

Before WordPress can get started with anything, it needs to first bootstrap everything. How this general process works is a topic for a different day, so I’ll just talk about the relevant bits here.

The key steps in the bootstrap process are:

WordPress normalises the incoming request URI across all servers and (internal/CGI) protocols
- This step also normalises “almost pretty” permalinks (i.e. permalinks prefixed with /index.php/) to be handled the same as regular “pretty” permalinks
WordPress checks if the request is for /favicon.ico and shortcircuits the request if so
If “advanced” (full-page) caching is enabled, WordPress loads it in
- Typically, full-page caches will serve cached requests at this point to avoid the performance hit of loading the rest
WordPress handles multisite if enabled
If WordPress is not installed, it redirects to the installer (or throws an error on multisite)

Already during the bootstrap process, there are a few places where redirects or full requests can be served back. The most common case with full-page caching enabled is that the cache will serve back a request using its own routing. The other cases are mostly error cases, with the exception of multisite, which I’ll cover later.

Note that all of these cases happen before the Rewrite system is started, so it’s not possible to use rewrites to handle favicons, multisite routing, or caching. This is all by design, as these checks have to run early either for performance or to check for basic bootstrapping errors.

You can however use the various hooks provided in the bootstrapping process to handle these requests, if you register your callbacks before wp-settings.php is loaded. You can also handle it in your wp-config.php; don’t forget that’s just PHP, so you can run whatever code you want there.

Initialising the Routing

After the basic bootstrapping in WordPress is done, we get into the actual routing instantiation. Firstly, WordPress instantiates the critical routing classes (WP_Rewrite and WP).

Instantiating WP_Rewrite fires off rewrite initialisation. This loads in all the various settings and sets properties that can later be used for rewrite generation. This also includes setting the “verbose page rules” flag, which is used when your permalink structure contains one of a few specific tags: those which start with slugs, and would potentially cause pages and posts to have conflicting permalinks. Verbose rules change how routing happens later, causing WordPress to “double-check” the URL during routing.

Before WordPress 3.3 (specifically, #16687), verbose page rules caused one-rule-per-page to be generated, which (needless to say) wasn’t great for performance on large sites. This was changed to instead check only when necessary.

Once this done, our oft-forgotten friend wp-blog-header.php kicks off the actual routing. This runs WP::parse_request which is where the actual routing in WordPress is (generally) done. Basically the first thing this does is to load in the “rewrite rules”.

Generating the Rules

Before we can start doing any routing, we need to convert the user settings to something we can actually work with. Specifically, we need to generate the rewrite rules.

Rewrite rules are essentially a gigantic map of regular expression to “query”. For example, the basic rule for categories looks like:

'category/(.+?)/?$' => 'index.php?category_name=$matches[1]'

If you’ve ever used any other routing in pretty much any web framework, you might wonder what the hell the thing on the right is. This is a WordPress “query string” (which is not the same thing as WP_Query). Essentially, all “pretty” permalinks in WordPress map to this intermediate “ugly” form first, before then being mapped into a WordPress query. This ensures compatibility with sites that don’t support pretty permalinks, but means that WordPress doesn’t directly support “rich” routing (such as callbacks, complex queries, etc).

To generate these rules, we go back to the WP_Rewrite class, which attempts to load cached rewrites from the rewrite_rules option, and generates it if it is not available.

Building a Set of Rules

There are many sets of rewrite rules that are generated, and each is generated from a “permastruct” (for “permalink structure”) and an “endpoint mask”. The permastruct specifies the general format of the set of rules to generate, and the “endpoint mask” controls which suffixes (“endpoints”) are added to the permastruct.

A permastruct is a string with static parts and “rewrite tags”. Rewrite tags look like %post_id% and represent a dynamic part of the rewrite rule. WordPress contains a few built-in permastructs: “date”, “year”, and “month” for date archives; “category”, and “tag” for the built-in terms, “author” for author archives; “search” for search results pages; “page” for static pages, “feed” and “comment feed” for RSS/Atom feeds. It also has the main permastruct for single post pages, and “extra” permastructs as registered by plugins or themes.

The permastruct is combined with an endpoint mask, which is a bitmask specifying which additional rules to add to the main endpoint. WordPress includes 13 endpoint masks, plus 3 helper masks (EP_NONE, EP_ALL, and EP_ALL_ARCHIVES). These can be combined with bitwise operators (|, &, ~) to activate multiple endpoint masks at once.

Endpoint masks are very confusing for those unfamiliar with bitwise operations, so you typically don’t see them used much outside of WordPress core’s routes. Also, they’re not very extensible, as custom endpoint masks will conflict with each other. Avoid doing anything special with these, and generally follow existing guides on how to use them. Jon Cave’s post on Make/Plugins is the best way to understand them if you really want to get into it.

The permastruct and endpoint mask are passed to WP_Rewrite::generate_rewrite_rules(), which replaces the rewrite tags with their regular expression equivalents. It does additional parsing to then generate additional rules based on which rewrite tags were used, and using the endpoint mask. I won’t go into the specifics of this, as this is optimised code with lots of weirdness, but suffice to say it converts the parameters into an array of rules.

For example, the main post rewrite rules are generated using the user-set permastruct with the EP_PERMALINK endpoint mask. This takes the rewrite_rules setting as the permastruct (which looks like /%post_id%/%postname%/). generate_rewrite_rules() turns this into rewrite rules to match things like post attachments, feeds, pages (as in, paged posts), comment pages, embeds, and the combination of all of these.

Collecting all the Sets

WordPress repeats the rewrite generation for each set of permastructs it knows about (plus the “extra” permastructs added by plugins or themes), and it then combines them into a single set of rules. It also adds in some additional static rules (for things like deprecated feeds and robots.txt). It runs a couple of filters to allow plugins and themes to add other static rules as well.

Extra permastructs are typically generated by core helpers like register_post_type() or register_taxonomy(). Plugins don’t typically add permastructs manually, as the generation makes a lot of assumptions about things you want.

Once all of this is done, WordPress saves the rules into the rewrite_rule option to avoid having to regenerate them on the next request. However, if a plugin has flushed the write rules before wp_loaded, this saving is deferred to wp_loaded to ensure plugins don’t break the whole system.

Now that we know we have rewrite rules (whether loaded from the option or generated fresh), we can finally get around to routing our requests.

Matching the Rules

Back in WP::parse_request(), we now have the full rewrite rule array ready to use. First, we set up and normalise the incoming request on top of the stuff already done during bootstrapping. This includes removing any path prefixes if WordPress is installed in a subdirectory (or if we’re on a subdirectory site in multisite).

Root requests (i.e. for /) are normalised to the empty string (''), and matched directly to the '$' rule, which improves performance for one of the most commonly-requested pages on the site. (As '$' is also (typically) the last rule in the rewrite array, this also saves us running potentially hundreds of regular expression checks that will never match.)

All other requests go into the main matching loop. This loop takes every rewrite rule and attempts to match the regular expression against the requested path (twice, in case the URL needs decoding). If the rewrite rule matches, the “query” for the rule is stored, and the loop breaks (as only one rule can match). If no matches are found, $wp->matched_rule remains unset.

If verbose page rules are set and the “query” contains the pagename query var, the loop first checks to see if the URL actually matches a real post. (It also checks that the post is public to ensure drafts aren’t accidentally exposed via their URL.) This check allows multiple post types to have overlapping rewrite rules, and means that potentially multiple rules can match a single request.

If a match is found, WordPress then parses the URL using the “query” string from the rule. This transforms a URL like /42/my-post/ into an array of query vars like [ 'p' => 42, 'name' => 'my-post' ]. This transformation is done using regular expressions which understand how to turn $matches[1] into the first item of the rule’s regular expression result.

This parser is used to maintain backwards compatibility with the older “parser”, which simply used eval() to parse the “query” into query vars.

WordPress also checks if the current request is for wp-admin/ or for a direct PHP file, and resets the query vars if so.

At this point, we’ve converted the requested URL into query vars, so the main part of the routing is done. All that’s left is to check that the query vars are allowed to be used, combine in $_GET (query string) and $_POST (data from the request body) variables, and apply some light sanitisation. Further permission checks and cleanup is also done to ensure everything is fairly normal. If any errors occurred, the error query var is also set to enable it to be handled later.

Using the Query Vars

With the query vars all set and established, WordPress now starts using them. It does error handling based on the error query var as part of sending headers, and bails from the request if specific errors were hit (403, 500, 502, or 503 errors). It turns off browser and proxy caching for logged-in users, and sends various caching headers for feeds, and sends the HTML Content-Type for everything else.

All the other query vars are passed as query arguments to WP_Query, and this sets the “main” query. After this is done, 404 requests are sent if WP_Query didn’t manage to find anything (with some conditions on that). If a 404 occurred during routing, WordPress checks this when parsing the query vars, and sets the internal 404 flag.

The specifics of how querying and rendering the results are done is out of scope for this explanation, but has been explained to death elsewhere, as you’ll actually need to interact with this in plugins and themes.

Special Cases

Multisite

While rewrite rules handle matching requests inside a site, a different system is using for matching requests to sites first. This is for a few different reasons: rewrite rules can be changed by plugins, which are site-specific; site data needs to be loaded first for rewrite settings; and multisite routing uses both the domain and the path.

Multisite routing is kicked off when ms-settings.php is loaded in wp-settings.php. The routing first loads sunrise.php, which traditionally handled “domain mapping”; that is, routing external domains to sites. WordPress 3.9 enabled doing this internally in WordPress by simply setting the site’s URL to the external domain, but plugins are still required for multiple domains. (The sunrise file can also be used for many other purposes, but routing remains one of its main purposes.)

If the sunrise process did not handle the routing, WordPress normalises the host and path, then uses this information (along with the SUBDOMAIN_INSTALL flag) to try and find the current site. The mechanisms by which it does this are fairly readable, so I’ll leave it as an exercise to the reader to look into this: simply read and follow the source of ms_load_current_site_and_network().

Once the site has been routed, the site’s details are loaded into relevant global variables. This includes the site’s URL (home_url()), which is later stripped during normalisation in WP::parse_request() (see “Matching the Rules”). This ensures that any path for the multisite install is not used when matching rewrite rules.

REST API

The REST API uses its own routing and endpoints for a few reasons. Unlike regular WordPress requests, the REST API does not always generate a “main” query, so it does not need the query var mapping system. Additionally, REST API “endpoints” (no relation to “endpoint masks”) are matched using both the HTTP method (typically GET, POST, PUT, or DELETE) and the path, unlike regular WordPress rewrites, which are method-agnostic.

The routing inside the REST API is much more similar to traditional routing in non-WordPress contexts, and it matches the pair of HTTP method and path to a callback rather than a query.

To bootstrap the process, the REST API registers rewrite rules which match /wp-json/(.*) to a custom query var called rest_route. After the rewrite system has matched the request URL to this rewrite rule (on parse_request), the REST API checks this query var. If it’s set, it initialises WP_REST_Server, and handles the routing inside WP_REST_Server::serve_request().

The API first sends some setup and security headers, then does some further setup for JSONP responses. It then initialises a WP_REST_Request object. This object contains all the data about the request, and allows the API to be re-entrant: that is, you can run multiple REST requests in one WordPress request, because all the “global” information is contained in this object. The API then checks that no errors occurred during authentication, and if everything is good, it then “dispatches” the request.

WP_REST_Request::dispatch() runs a similar routing loop to WP::parse_request(), but without special cases for verbose rules. Unlike rewrite rules, each route can have multiple “endpoints” (i.e. callbacks). If the route matches, the API loops over each endpoint (called “handler” in the code) and checks whether the method for the endpoint also matches.

If it matches, the callback is then called, with some other stuff around it. Exactly how these requests work is a topic for a different post, as the API does a lot of special handling around this.

Once the callback has been run, the end result is a WP_REST_Response object. This object contains the response data as well as any headers or status code to send back to the client. Headers are then sent back to the client before encoding the response data as JSON and finally echoing it to the client. Back in rest_api_loaded(), the WordPress request is now finished off, ensuring that further routing/handling in the WP class is skipped.

Limitations

The design of Rewrites is classic WordPress: it maintains wide compatibility, both forward and backward, through clever and careful design. There’s much to like about this system, but the core feature of mapping “pretty” permalinks back to “ugly” permalinks is very smart. This makes compatibility between the two inherent, and it ensures new code is automatically compatible.

The biggest problem is that Rewrites is inherently tied to post querying. To be clear, this is not a problem with Rewrites, but rather with the overall design of the frontend system in WordPress. This makes routes not tied to posts much more difficult to design and implement. While this worked well for the original, blog-focussed nature of WordPress (where essentially everything was simply a filtered post archive), it has been stretched to its limits as a modern CMS.

This is evident in the REST API, where posts are no longer the main content type, and anything (users, themes, the current theme) in WordPress is addressable via a URL. When I designed the REST API’s routing, it was with these limitations in mind, which is why it uses a completely custom router. This router also works by “skipping” the main query, which it actually does by exiting before queries and templates are loaded. This is workable for a separated system like the API, but isn’t a good idea if you want to instead design user-facing pages which actually use templates (say, for a checkout page).

Understanding Rewrites can also be tough if you don’t know where to start, which is why a lot of people miss key parts or don’t quite understand the flow. A significant part of this is the organic way in which the WP and WP_Rewrite classes have grown, which means that understanding the flow requires a lot of flicking back and forth. I’d wager that quite a lot of WordPress developers don’t even know the WP class exists and acts as the main engine of the request; I didn’t until I really dug into Rewrites while working on core.

So Much More

There’s a lot more that happens that I didn’t cover here, so let me know if you want to see any more detail on anything specific. Just knowing where to start can be challenging some times, particularly with these systems that have organically grown.

Also, if there’s anything else you’d like to see a breakdown of, let me know! I’d like to demystify more of WordPress if you found this useful.

State of the REST API 2017

2nd of December, 2017

As we approach the State of the Word 2017 at WordCamp US, I think this is a good time to look back on the state of the REST API project and core focus over the last year.

2017 has been an interesting year for the REST API with highs and lows, and periods of intense development and slowdowns.

4.7

Immediately after WordCamp US 2016, WordPress 4.7 was released, which merged the second stage (endpoints) of the REST API. This was a major milestone for the REST API, and marked the culmination of our efforts over the previous 4 years.

In the months after 4.7’s release, we followed up by fixing bugs in the REST API, including two security bugs. One of these security bugs was a very serious privilege escalation issue, which was unfortunately caused by an unrelated low-level change in WordPress that the API hadn’t guarded against. A huge thanks to Securi, the security team, and to the hosts for working hard on mitigating, fixing, and deploying this issue. As the first large security release involving the API team, there was definitely much for us to learn, and we definitely learnt much.

With 4.7 representing a major culmination of the API team effort, most of the “core” REST API team (myself, Daniel, Rachel, and Joe) who lead the project pre-merge ramped down contribution. After years of intense effort on the project, we simply needed a break. As is the nature of open source projects, contribution is voluntary, and a massive thanks has to go to everyone involved for such a long time, especially to my co-lead Rachel. Thanks also to those who picked up the baton on the core work, including (but not limited to) Adam Silverstein, James Nylen, and K. Adam White.

The API has continued to improve in an iterative way throughout the year, with bug fixes and improvements from many members of the community. These have helped the API become more refined, stable, and most importantly, useful.

Core Focus

Organisationally, as we shifted from an independent feature project to part of core, the API also transitioned from a project to a “core focus”. This is a new concept and structure in WordPress, representing a large shift from the previous, release-driven product cycles.

With this change, our official goal was set as “getting first party wp-admin usage of the new endpoints, and hopefully [replacing] all of the core places where we still use admin-ajax”. Progress towards this goal throughout the year has been slow.

Part of the reason for this slowness has been a major drop in contribution. With our shift from GitHub to Trac, the number of drive-by contributions has fallen, with contributions coming more from regular core contributors. Additionally, with most of the core API team taking a break and moving on, the organisational and regular contribution has dropped massively. The combination of this drop in contribution to the API along with the ramp up in contribution on other focuses (Gutenberg in particular) means we are by far and away the slowest moving core focus, which has the flow-on effect of making us less attractive to contribution.

The scope of our official goal is also massive. As WordPress has grown organically from a static HTML admin to a more interactive interface, admin-ajax endpoints have grown likewise, with specialised endpoints added as and when needed. Our audit of the actions in WordPress showed 92 separate endpoints across 14 different categories, spanning every section of the admin. The organic growth of these has meant that a lot of the frontend code is tied specifically to the admin-ajax response, and vice versa, including passing generated HTML. In the process of investigating with these endpoints, experimentation showed that switching to use the API would essentially require rewriting the feature in order to use it.

Fundamentally, I think the core focus’ goal is at odds with the strengths of the REST API team. Our focus should be providing support to other teams (like the Gutenberg, Customiser, and mobile teams) and empowering developers to build on the API, rather than rewriting parts of the admin. The goal of switching the admin to the REST API is valuable, but it should be part of efforts to improve the user experience, rather than simply refactoring (especially when refactoring would needlessly break existing plugin functionality).

However, progress towards the goal across the entirety of WordPress has been good. One of the largest areas of admin-ajax usage is in the edit screen, where many of the admin-ajax endpoints will be replaced entirely through Gutenberg’s use of the REST API. Likewise, the Media endpoints will likely become legacy as the media library changes for use with Gutenberg. A significant portion of the endpoints are around themes, which are covered by the Customiser team’s efforts. The accessibility team is in the process of reworking the settings pages, which as a side-effect, will allow us to remove the settings endpoints; the Live Settings experiment also shows that we have the ability to improve the user experience here as well. All of these efforts move away from admin-ajax while also improving the user experience.

Moving forward into 2018, the REST API team needs to shift focus to the issues with the API, and focus on helping other teams do what they do best.

Our Big Issues

There are three big issues which the REST API should be focussed on to move forward: authentication, functionality, and empowerment.

Authentication

I’ve talked about authentication endlessly, and will keep talking about it until we solve it, because it is one of the largest problems we still have. Without a viable external authentication solution, over a third of the code in the REST API isn’t usable outside of the site. Imagine if Stripe’s API didn’t let you charge new credit cards, or if Twitter’s mobile apps only let you read tweets but not reply. This is the situation facing API users right now, and is the biggest reason the WordPress mobile app is not powered by the REST API.

Discussions at the Contributor Summit this year with the mobile team were fruitful, and we have a practical plan forward for rolling out select support for the official mobile apps while we work on rolling out general solutions. (As well as potentially enhancing the user experience significantly with things like magic login links.)

Efforts at the WordCamp Europe contributor day provided a massive push forward on OAuth 2 support, with the plugin now beginning to stabilise. A further focus here will allow us to build up the crucial momentum for development, and work with client developers.

OAuth 2 also provides a much cleaner way forward to solving the distributed API problems than our previous solutions. In the coming weeks, we’ll publish the first alpha versions of this new solution, which stays true to the original design goals of the first version of the broker while improving upon it in every way.

Functionality

With the push for merging the REST API, the nature of deadlines meant we had to push many features from the API to double-down on our core functionality. However, this means that the REST API is incomplete. In particular, we’re missing support for key objects in WordPress, including menus, widgets, plugins, and themes. We’re also missing some crucial functionality around existing objects, like post previewing. This means that while it’s possible to build apps on top of the API, some of the core functionality users expect is missing from those apps.

This has flow-on effects to the other core focuses. Incomplete support for drafts in the API has caused problems in Gutenberg, and the lack of support for appearance functionality causes the burden to shift on to the Customiser team. These fall squarely within our responsibility, and we’re currently letting them down.

Empowerment

The core of what the REST API does is empower developers to build things better and faster. To empower developers, we need to take the time to improve our documentation and provide better tooling. We have two fantastic API client libraries in the Backbone and Node libraries, and we should continue to push these forward while also helping to develop the client library ecosystem.

Key to this is ensuring that we are where our users are. Our previous contribution process on GitHub allowed us to benefit from drive-by contributions from developers using the API, and the momentum of Gutenberg on GitHub likewise shows the power of being where developers are. We recently migrated the developer reference to GitHub, and we need to look at further ways to embrace the external developer community here.

Additionally, we need to engage in more outreach with our users. While the user experience side of WordPress engages in user testing and outreach, there’s no equivalent for the REST API. We should be getting more client developers involved in the process, including the mobile team and developers of significant apps like Calypso and MarsEdit.

Our Future

While 2017 has been a tough year for the API, as we move into 2018, I think the state of the REST API is strong. We have the skills and the vision to push the API forward with our focuses, and empower other developers, whether inside the core development community or external.

But in order to enact these changes, make progress, and rebuild a strong contributor base, we need to make the fundamental changes to our organisation and goals to push forward.

We need to embrace our strengths and focus on the areas of highest impact. We need to listen to and work with our users on improving their experience and fixing their problems. And most importantly, we need to get more users by making the REST API more useful to more people.

Requests for PHP: Version 1.7

13th of October, 2016

Requests 1.7 is now available with a tonne of changes. Here’s some of the highlights:

Add support for HHVM and PHP 7: Requests is now tested against both HHVM and PHP 7, and they are supported as first-party platforms.
Transfer & connect timeouts, in seconds & milliseconds
Rework cookie handling to be more thorough: Cookies are now restricted to the same-origin by default, expiration is checked.
Improve testing: Tests are now run locally to speed them up, as well as further general
improvements to the quality of the testing suite. There are now also
comprehensive proxy tests to ensure coverage there.
Support custom HTTP methods: Previously, custom HTTP methods were only supported on sockets; they are now supported across all transports.

View all changes

There are also a tonne of tweaks and hardening in the release to improve compatibility with sites running older versions of cURL, or on more obscure setups. General improvements all-round mean version 1.7 is even more stable and compatible than ever before, while still retaining the same great developer-focussed API as always.

Quite a lot of the contributions for 1.7 come from WordPress developers (including Dominik Schilling and Dion Hulse) who have begun contributing to Requests recently, as Requests is now included in WordPress as of version 4.6! I’ve been one of the maintainers of the WordPress HTTP API for a while, so this is a pretty natural inclusion that should bring more contributors to Requests, and much better tested code to WordPress.

Thank you to every one of the 23 contributors to this release, in alphabetical order: Adrian Philipp, Brandon Hesse, Chris Lock, Christopher A. Stelma, Denis Sokolov, Dion Hulse, Dominik Schilling, Eric GELOEN, Jarne W. Beutnagel, Justin Stern, Korbinian Würl, Laurent Martelli, Markus Staab, Michael Orlitzky, Misha Nasledov, Ogün KARAKUŞ, Remigiusz Dymecki, Rodrigo Prado, Ryan McCue, Stephen Edgar, Stephen Harris, ozh, qibinghua.

Patch WordPress via GitHub

22nd of March, 2016

A few days ago, I started tweeting about the Stack Overflow Developer Survey, where 74% of developers surveyed said they dread working with WordPress. I received a tonne of replies that I’m still working through, and I’ll post about that soon.

One reply that did come up a few times was contributing via GitHub. Matt announced in the State of the Word that you’d soon be able to contribute to WP via pull requests, however that hasn’t happened so far. I had a few discussions with some of the core team about this, but alas it never got anywhere.

However, after this discussion, I realised I could do something about it right now as a proof-of-concept. Trac exposes an XML-RPC interface, and GitHub exposes a REST API, so hooking the two up only requires a minimal amount of code.

So, introducing GitHub-to-Patch, a tiny utility to allow submitting PRs to WordPress.

Here’s how you submit a pull request for WordPress using this:

Find the ticket on Trac you want to upload a patch to.
Submit a pull request to the WordPress/WordPress repo, then close it to keep GitHub clean. (You can still continue to update it.)
Head to the GitHub-to-Patch page.
Select your pull request.
Enter the ticket number.
Enter your Trac/WordPress.org username and password.
Preview the patch you’re about to submit and verify the details.
Done! You should also leave a comment about the patch you just added. 🙂

If you update your PR and want to upload your changes, simply repeat the same process; Trac will automatically name the patches correctly to avoid overwriting previous ones.

Internally, the utility uses GitHub’s API to get a patch format of the pull request, then uses Trac’s XML-RPC API to upload. This requires your WordPress.org credentials, and because of cross-origin policy, also requires an intermediary server. 🙁 I hope to fix this in the future, either by integrating the tool into Trac itself, or by using OAuth with WordPress.org. In the meantime, if you don’t trust my server, you can install and run the tool from GitHub with minimal effort.

In the future, I’ll likely create a PR bot to automatically close PRs and point users to the tool, and to note when people have uploaded their PR as a patch.

Thanks to Eric Andrew Lewis for his pull request to the grunt-patch-wordpress repo that made me realise I could do this. 🙂

The (Complex) State of Meta in the WordPress REST API

16th of February, 2016

One of the other discussion points in our recent API meeting was the state of meta in the REST API. We recently made the somewhat-controversial decision to remove generic meta handling from the API. As we didn’t have time to get into the specifics in the meeting, I wanted to expand on exactly what we’re doing here, and our future plans.¹

WordPress has four different types of meta: post meta, comment meta, term meta, and user meta. These broadly act the same, so for simplicity’s sake, I’ll be grouping them together as just “meta”.

Meta also falls into two broad groups: plugin data, and user input. The distinction here is that plugin meta is set by a plugin programmatically, whereas user input is set via the Custom Fields metabox. These are broad categorisations, but the general difference is that plugin meta tends to be “protected” (typically prefixed with an underscore), whereas user input meta is any sort of freeform name (and occasionally no name at all).

Solution for Plugin Data

Right now, there is a viable solution for plugins to handle meta through the REST API: register_rest_field(). This function allows registering extra fields on a resource (like a post) and handling them in your own code.

For example, let’s say we have a plugin that adds “featured emoji” to a post, which saves a string of emoji characters for a post. We already have a metabox for this in the admin, and now we want to expose it via the API. This is super easy:

register_rest_field( 'post', 'featured_emoji', array(
    'get_callback' => function ( $data ) {
        return get_post_meta( $data['id'], '_featured_emoji', true );
    },
    'update_callback' => function ( $value, $post ) {
        // TODO: sanitize and validate this field better
        $value = sanitize_text_field( $value );

        update_post_meta( $post->ID, '_featured_emoji', wp_slash( $value ) );
    },
    'schema' => array(
        'description' => __( 'Featured emoji for the post to add a little flavour.', 'femoji' ),
        'type' => 'string',
        'context' => array( 'view', 'edit' ),
    ),
));

Solution for Custom Fields

User input meta is also handled, using the generic meta API. This is the /wp/v2/posts/{id}/meta route in the API, which is the route that was recently pulled out of the API plugin itself.

This route is practically only useful for replicating the Custom Fields metabox in the post editor and is not generally useful for plugins and themes. In fact, the endpoints have feature parity with the Custom Fields metabox and the same rules around visibility: if it appears in the metabox, it appears in the API (and vice versa).

Why Separate Solutions?

You may be wondering why we can’t use the same solution for both groups of meta. There are a number of complex issues here, but the key issue is that we cannot reliably separate the two groups. Unlike custom types (post types, taxonomies), meta doesn’t have to be registered before use. This is super handy most of the time, but also means that meta is a bit of a minefield. This leads to surprising behaviour for API users: plugin meta is (mostly) not available via the /meta endpoint.

Protected Meta

The _ prefix is used throughout WordPress to indicate that a field is “protected”. Unfortunately, exactly what “protected” means is usually undefined, but the one thing it reliably indicates is that the key shouldn’t be exposed through the Custom Fields metabox. As the /meta endpoint is designed to mirror the metabox, we don’t expose protected meta via the endpoints. This means that this endpoint isn’t useful for many plugins.

You can, however, whitelist individual keys by filtering is_protected_meta. This allows exposing plugin data via this standard meta API; for example, to expose WooCommerce’s _price field:

add_filter( 'is_protected_meta', function ( $protected, $key, $type ) {
    if ( $type === 'post' && $key === '_price' ) {
        // Expose the `_price` meta value publicly
        return true;
    }
    return $protected;
}, 10, 3 );

This can be somewhat confusing though, because protected meta is still not exposed if it falls into one of a few other categories. In addition, it will now appear in the Custom Fields metabox as well.

Complex Values & Serialized Data

One of the categories of meta we can’t expose is serialized data. This applies regardless of whether the meta field is marked as protected or not. This is potentially surprising to plugin authors who might be explicitly whitelisting their meta field for the API, and yet it still isn’t exposed. The key reason for this is that accepting serialized data is either lossy or unsafe.

To understand why serialized data is unsafe, we need to look at what serialized data actually is. At its core, serialization is a way to pack complex data into simple data, in this case a string. We need to include enough data to reverse the process to ensure the process is lossless. The PHP serialization format encodes the two pieces of data that a variable contains: the type, and the value. For simple scalar values, the scalar type itself is encoded: integers become i:val;, such as i:42;; strings become s:size:value; such as s:3:foo;, etc. Arrays are encoded in a more complex way, as they need to encode the type (array), size, keys, and values: this is encoded as a:size:{key;value} where key is a serialized scalar value and value is any serialized value. For example, array('foo' => 42) serializes to a:1:{s:3:"foo";i:42;}.

Objects are slightly more complex, because the “type” itself is complex and includes the class. The format is very similar to arrays (as objects are essentially just the property array), but with the a type replaced with O:classnamelength:classname as the type. This gives a value like O:16:"WP_HTTP_Response":3:{s:4:"data";N;s:7:"headers";a:0:{}s:6:"status";i:200;}.²

The object type is where the problems with serialized meta arise. When a serialized value is unserialized, these classes are instantiated, and the __wakeup() method on the class is executed if it exists. Because of this, allowing serialized data to be saved allows remote code execution by the client saving the data. For example, if an attacker finds a class (and you only need one) with a __wakeup method, they can execute that code by submitting serialized data. Alternatively, if a class assumes that one of its properties is safe to run eval on, or to pass into the database directly, this can be exploited too.

This may sound a bit daft, but this is not a theoretical bug. YAML supports deserializing data into Ruby objects with --- !ruby/hash:classname. This wasn’t generally seen as an issue until it was discovered that a specific ActionDispatch object in Ruby on Rails was running eval on one of its properties. As a result, every Rails site was vulnerable to arbitrary code execution, which is one of the worst classes of bugs.

Serialized objects are not inherently dangerous, but they massively increase the attack surface of the API. Exposing serialized objects as read-only is almost a potential privacy issue, as it leaks internal implementation details (class names). For these reasons, we made a calculated decision not to allow serialized data.

One potential solution to allowing complex data is to convert it to JSON-native data. The issue with this is that JSON-encoding data is lossy. PHP objects will be converted down to a generic JSON object, and associative arrays and object data cannot be distinguished. Additionally, PHP doesn’t distinguish between numerically-indexed arrays (JSON lists) and associative arrays (JSON objects). These issues mean that simply sending back the object you received will cause data loss.

For these reasons, we can’t support serialized data in the API via any endpoints, including meta and a future options endpoint.³

Permissions

As a result of most meta not being registered, the permissions area is a bit sketchy. While the add_post_meta, edit_post_meta and delete_post_meta meta-capbilities exist, there’s no similar meta-capability for reading post meta. This is the key reason meta is only available while authenticated, as we need to instead fall back to edit_post.

This again is a result of user input meta and plugin data not being clearly defined. In a very early version of the API, user input meta was exposed by default, until it was noted that users often use these fields for internal notes and workflow. (Despite this, the_meta() template tag exists to output these fields on the frontend.)

In addition, plugins adding meta have no fine-grained controls over meta field access. While write capabilities can be controlled precisely, whether someone can read the meta fields depends on how they’re used and can be inconsistent.

Making It All Better

So, how do we fix all of this? A while ago we talked about loosening the rules, but it turned out this wasn’t viable without core changes to WordPress. During the hackday for A Day of REST a few weeks ago, one of the groups took on this issue and came up with a plan. Key to this plan is changing core to support better meta registration.⁴

These changes to core should improve meta usage not just in the REST API, but also across the board for the rest of core and plugins. This also helps to lay some of the groundwork and low-level infrastructure for the fields API in a future version of WordPress.⁵ Expanding this out allows better tooling around meta as well; for example, we may be able to clean up metadata for deactivated plugins if meta is registered consistently.

As tooling and infrastructure develops around meta fields (including the fields API), this may allow us to solve the complex data issues as well. Being able to explicitly say that a field contains a list of strings (e.g.) would allow us to safely expose the values, and avoid potential data loss from JSON serialisation.

These changes will take time to finalise and execute, and it will be a while until the ecosystem fully adopts these changes. In the meantime though, we’d like to ship a REST API. Without these changes, we don’t have the ability to automatically expose plugin meta, however plugins can already register their fields manually, and future changes would simply provide better tools for developers.

We believe it’s in the WordPress project’s best interest to ship what we have and continue to iterate as we make these changes. Holding back the rest of the API for completion’s sake benefits nobody.

Thank you to the Meta Team at A Day of REST for volunteering to tackle this complex issue, and for their comprehensive discussion and planning. Thanks also to Brian Krogsgard for proofreading, and to Daniel Bachhuber, Joe Hoyle, and Rachel Baker for being generally awesome.

We also had to gloss over exactly how progressive enhancement works, so I fleshed this out in a recent post if you missed it. [↩]
Objects implementing the Serializable interface instead use C: instead of O:, but these are not supported by WordPress for historical reasons. [↩]
“What about XML-RPC?” you may ask. The XML-RPC API only allows reading serialized meta, which is a minor privacy issue as it may expose internal implementation but is not a security issue. However, since serialized meta can’t be saved via the XML-RPC API, attempting to write the data you just read for a field will cause it to be saved double-encoded, which means it’s lossy. [↩]
Did you know there’s a system in core to register meta? I didn’t before we tackled this problem in the API, and that’s a key part of the problem. [↩]
This “groundwork” consists of expanding the scope of register_meta to take arbitrary parameters similar to register_post_type, plus promoting register_meta and making sure people know it actually exists. [↩]

Progressive Enhancement With the WordPress REST API

6th of February, 2016(Updated 6th of February, 2016)

In a REST API discussion today, we discussed the future of the REST API. Something I touched upon briefly in that meeting is the concept of progressive enhancement with the REST API. Since this topic hasn’t been brought up much previously, I want to elaborate on how progressive enhancement works.

Progressive enhancement is our key solution to a couple of related problems: forward-compatibility with future features and versions of WordPress, and robust handling of data types in WordPress. Progressive enhancement also unblocks the REST API project and ensures there’s no need to wait until the REST API has parity with every feature of the WordPress admin.

For instance, custom post types can do basically whatever they want with their data, so we wanted a robust system for indicating feature support via the REST API. For example, post types which don’t have the editor support flag won’t have content registered, similar to how the admin doesn’t show the content editor for those post types. In addition, plugins can do even crazier stuff like conditionally changing post types. The system in the REST API can handle these cases with ease, providing clients the ability to adapt on-the-fly to the format of the data they’re editing or displaying.

We also recognise that the REST API needs the ability to adapt to future versions of WordPress, and we want to avoid as many breaking changes as possible. Building the abilities for feature detection enables forwards-compatibility via progressive enhancement, and gives clients a reliable paradigm to safely check whether a WordPress supports a feature before trying to use it.

The progressive enhancement concept builds heavily on the model already used by browsers for this purpose. If you want to build a site that uses geolocation (e.g.), you can easily detect support for that and build while waiting for browser support, even including polyfills. Feature detection with the REST API can allow the same technique, and allow polyfilling while waiting for the long-tail of sites to update.

The interplay with the complexity of custom post types is almost a bonus here. If I’m building a replica of the post editor, using feature detection to select which “metaboxes” to show is basically a necessity. In the case of meta, clients need to be robust enough to do this already, as plugins can remove plugin support for custom-fields from the built-in post types, and clients need to respond to this.

Progressive enhancement exists in the REST API already, and is easily usable and accessible by clients that want to ensure robustness.

Building With Progressive Enhancement Today

As an example, let’s say I’m building a simple editor today that uses the REST API. Imagine essentially a slimmed down version of Calypso or MarsEdit.

My editor allows me to write posts, save them as drafts, edit them again later, and publish them when I’m ready. After the post is published, I can update and save, and that affects the live post. I can’t do post previews, as there’s no autosave support built in.¹

For now, I build my client without the autosave support, and instead bake autosave features into the editor itself. The WordPress admin already does this with localStorage saving for offline connections, and this system doesn’t require server-side support.

Progressively Enhancing In A Future Release

In a future release, we have the autosaving process nailed down, so we mark our extra feature plugin as done and merge it into core. The autosave endpoint then gets rolled out in the next WordPress major release.

In my client, I want to add the extra server-side autosave support on top of my local autosaving. To do this, I look to see if the feature is supported on the site. In this case, the “feature” I want is the POST endpoint on the /wp/v2/posts/{id}/autosave route, so I check the index to see if the route is there and supports the method.

I see that the site supports the method, so my client transparently starts using server-side autosaves.

This feature detection already exists in the REST API today. For instance, compare the results of http://demo.wp-api.org/wp-json/ and https://wordpress.org/wp-json/ You can easily see which supports creating posts by inspecting for /wp/v2/posts and see that it supports POST.

Plugin Detection

REST API clients can also easily detect which plugins are available on a site. REST API endpoints are registered with two parts to their name: the namespace and the route. The namespace typically looks like name/v1 with a unique slug combined with the version; for the core plugin, this is wp/v2. This system works similarly to function namespacing in plugins currently, and we expect (and strongly recommend) that plugins and themes treat this as their unique slice of the API space.

Let’s say I want to check if WooCommerce is installed on a site. I simply fetch the index route and check the namespaces key to see if woocommerce/v3 is registered.

Again, plugin detection already exists in the REST API. Compare again http://demo.wp-api.org/wp-json/ and https://wordpress.org/wp-json/. The demo site supports the core endpoints, as it has wp/v2 registered, whereas wordpress.org only has the oEmbed endpoints.

More Granular Detection

We can far more granular with detection too. Each route supplies information about the schema that the response follows (and request data when creating or updating resources). This is available either via an OPTIONS request to the route, or by fetching the index with ?context=help.

To detect fields, we simply need to check the schema for that field. If generic meta support is pushed back to a future release, enabling clients to interact with this would be easy. For argument’s sake, let’s say it’s added as the custom_fields property on the post resource.

To detect feature support, we simply need to do an OPTIONS request on /wp/v2/posts/42 (for post 42), then check that $.schema.properties.custom_fields exists, and matches the format we’re expecting. We can then display a “custom fields” metabox-style interface in the editor for this.

Again, this level of feature detection already exists in the REST API today, and even more than that, we already recommend using this process for existing endpoints. When interacting with custom post types, you can detect whether the post type is hierarchical by checking for $.schema.properties.parent. You can detect whether a post supports reordering by checking for $.schema.properties.menu_order.

This applies even when not working with custom post types: you can detect whether a post supports featured images and whether the site/theme supports them by checking that $.schema.properties.featured_media exists. This isn’t a theoretical concern, robust editors already need to do this, as themes have differing support for WordPress features, and these changes need to flow through clients. In addition, plugins have essentially unlimited flexibility, and clients need to recognise this and support it in order to maximise compatibility across the long-tail of WordPress installs and configurations.

Meta with `register_rest_field`

One thing that was glossed over is that despite us pulling support for generic meta, we still have opt-in support for meta handling at a lower level.

If I’m a plugin author that wants to add my own data to a post response, I can simply use code like:

register_rest_field( 'post', 'rm_data', array(
    'get_callback' => function ( $data ) {
        return get_post_meta( $data['id'], '_rm_custom_field', true );
    },
    'update_callback' => function ( $value, $post ) {
        update_post_meta( $post->ID, '_rm_custom_field', sanitize_text_field( $value ) );
    },
    'schema' => array(
        'description' => 'My custom field!',
        'type' => 'string',
        'context' => array( 'view', 'edit' ),
    ),
));

Since I’ve registered the schema data, this is automatically added to the schema for me. Clients can then detect my feature automatically by checking for $.schema.properties.rm_data. The API here gives me feature detection for free. The proposal to enhance register_meta in core (https://core.trac.wordpress.org/ticket/35658) will enable even easier integration with the API.

Moving Forward

Right now, the REST API team, and the WordPress community, needs a clear path forward to get from the feature-plugin as it exists today to a sustainable long-term project. Being able to ship and move on is a key part of this, as well as providing the room to expand in the future.

We believe that the progressive enhancement approach is the best approach for continuing API development. Progressive enhancement is a paradigm the REST API project must adopt, if it’s an API we want to add to (without breaking backwards compatibility) over the next 10 years.

I’m choosing autosave support here, however it’s very possible this will be completed and merged in the very near future. It’s a convenient example to use though. [↩]