The (Complex) State of Meta in the WordPress REST API

One of the other discussion points in our recent API meeting was the state of meta in the REST API. We recently made the somewhat-controversial decision to remove generic meta handling from the API. As we didn’t have time to get into the specifics in the meeting, I wanted to expand on exactly what we’re doing here, and our future plans.1

WordPress has four different types of meta: post meta, comment meta, term meta, and user meta. These broadly act the same, so for simplicity’s sake, I’ll be grouping them together as just “meta”.

Meta also falls into two broad groups: plugin data, and user input. The distinction here is that plugin meta is set by a plugin programmatically, whereas user input is set via the Custom Fields metabox. These are broad categorisations, but the general difference is that plugin meta tends to be “protected” (typically prefixed with an underscore), whereas user input meta is any sort of freeform name (and occasionally no name at all).

Solution for Plugin Data

Right now, there is a viable solution for plugins to handle meta through the REST API: register_rest_field(). This function allows registering extra fields on a resource (like a post) and handling them in your own code.

For example, let’s say we have a plugin that adds “featured emoji” to a post, which saves a string of emoji characters for a post. We already have a metabox for this in the admin, and now we want to expose it via the API. This is super easy:

register_rest_field( 'post', 'featured_emoji', array(
    'get_callback' => function ( $data ) {
        return get_post_meta( $data['id'], '_featured_emoji', true );
    },
    'update_callback' => function ( $value, $post ) {
        // TODO: sanitize and validate this field better
        $value = sanitize_text_field( $value );

        update_post_meta( $post->ID, '_featured_emoji', wp_slash( $value ) );
    },
    'schema' => array(
        'description' => __( 'Featured emoji for the post to add a little flavour.', 'femoji' ),
        'type' => 'string',
        'context' => array( 'view', 'edit' ),
    ),
));

Solution for Custom Fields

User input meta is also handled, using the generic meta API. This is the /wp/v2/posts/{id}/meta route in the API, which is the route that was recently pulled out of the API plugin itself.

This route is practically only useful for replicating the Custom Fields metabox in the post editor and is not generally useful for plugins and themes. In fact, the endpoints have feature parity with the Custom Fields metabox and the same rules around visibility: if it appears in the metabox, it appears in the API (and vice versa).

Why Separate Solutions?

You may be wondering why we can’t use the same solution for both groups of meta. There are a number of complex issues here, but the key issue is that we cannot reliably separate the two groups. Unlike custom types (post types, taxonomies), meta doesn’t have to be registered before use. This is super handy most of the time, but also means that meta is a bit of a minefield. This leads to surprising behaviour for API users: plugin meta is (mostly) not available via the /meta endpoint.

Protected Meta

The _ prefix is used throughout WordPress to indicate that a field is “protected”. Unfortunately, exactly what “protected” means is usually undefined, but the one thing it reliably indicates is that the key shouldn’t be exposed through the Custom Fields metabox. As the /meta endpoint is designed to mirror the metabox, we don’t expose protected meta via the endpoints. This means that this endpoint isn’t useful for many plugins.

You can, however, whitelist individual keys by filtering is_protected_meta. This allows exposing plugin data via this standard meta API; for example, to expose WooCommerce’s _price field:

add_filter( 'is_protected_meta', function ( $protected, $key, $type ) {
    if ( $type === 'post' && $key === '_price' ) {
        // Expose the `_price` meta value publicly
        return true;
    }
    return $protected;
}, 10, 3 );

This can be somewhat confusing though, because protected meta is still not exposed if it falls into one of a few other categories. In addition, it will now appear in the Custom Fields metabox as well.

Complex Values & Serialized Data

One of the categories of meta we can’t expose is serialized data. This applies regardless of whether the meta field is marked as protected or not. This is potentially surprising to plugin authors who might be explicitly whitelisting their meta field for the API, and yet it still isn’t exposed. The key reason for this is that accepting serialized data is either lossy or unsafe.

To understand why serialized data is unsafe, we need to look at what serialized data actually is. At its core, serialization is a way to pack complex data into simple data, in this case a string. We need to include enough data to reverse the process to ensure the process is lossless. The PHP serialization format encodes the two pieces of data that a variable contains: the type, and the value. For simple scalar values, the scalar type itself is encoded: integers become i:val;, such as i:42;; strings become s:size:value; such as s:3:foo;, etc. Arrays are encoded in a more complex way, as they need to encode the type (array), size, keys, and values: this is encoded as a:size:{key;value} where key is a serialized scalar value and value is any serialized value. For example, array('foo' => 42) serializes to a:1:{s:3:"foo";i:42;}.

Objects are slightly more complex, because the “type” itself is complex and includes the class. The format is very similar to arrays (as objects are essentially just the property array), but with the a type replaced with O:classnamelength:classname as the type. This gives a value like O:16:"WP_HTTP_Response":3:{s:4:"data";N;s:7:"headers";a:0:{}s:6:"status";i:200;}.2

The object type is where the problems with serialized meta arise. When a serialized value is unserialized, these classes are instantiated, and the __wakeup() method on the class is executed if it exists. Because of this, allowing serialized data to be saved allows remote code execution by the client saving the data. For example, if an attacker finds a class (and you only need one) with a __wakeup method, they can execute that code by submitting serialized data. Alternatively, if a class assumes that one of its properties is safe to run eval on, or to pass into the database directly, this can be exploited too.

This may sound a bit daft, but this is not a theoretical bug. YAML supports deserializing data into Ruby objects with --- !ruby/hash:classname. This wasn’t generally seen as an issue until it was discovered that a specific ActionDispatch object in Ruby on Rails was running eval on one of its properties. As a result, every Rails site was vulnerable to arbitrary code execution, which is one of the worst classes of bugs.

Serialized objects are not inherently dangerous, but they massively increase the attack surface of the API. Exposing serialized objects as read-only is almost a potential privacy issue, as it leaks internal implementation details (class names). For these reasons, we made a calculated decision not to allow serialized data.

One potential solution to allowing complex data is to convert it to JSON-native data. The issue with this is that JSON-encoding data is lossy. PHP objects will be converted down to a generic JSON object, and associative arrays and object data cannot be distinguished. Additionally, PHP doesn’t distinguish between numerically-indexed arrays (JSON lists) and associative arrays (JSON objects). These issues mean that simply sending back the object you received will cause data loss.

For these reasons, we can’t support serialized data in the API via any endpoints, including meta and a future options endpoint.3

Permissions

As a result of most meta not being registered, the permissions area is a bit sketchy. While the add_post_meta, edit_post_meta and delete_post_meta meta-capbilities exist, there’s no similar meta-capability for reading post meta. This is the key reason meta is only available while authenticated, as we need to instead fall back to edit_post.

This again is a result of user input meta and plugin data not being clearly defined. In a very early version of the API, user input meta was exposed by default, until it was noted that users often use these fields for internal notes and workflow. (Despite this, the_meta() template tag exists to output these fields on the frontend.)

In addition, plugins adding meta have no fine-grained controls over meta field access. While write capabilities can be controlled precisely, whether someone can read the meta fields depends on how they’re used and can be inconsistent.

Making It All Better

So, how do we fix all of this? A while ago we talked about loosening the rules, but it turned out this wasn’t viable without core changes to WordPress. During the hackday for A Day of REST a few weeks ago, one of the groups took on this issue and came up with a plan. Key to this plan is changing core to support better meta registration.4

These changes to core should improve meta usage not just in the REST API, but also across the board for the rest of core and plugins. This also helps to lay some of the groundwork and low-level infrastructure for the fields API in a future version of WordPress.5 Expanding this out allows better tooling around meta as well; for example, we may be able to clean up metadata for deactivated plugins if meta is registered consistently.

As tooling and infrastructure develops around meta fields (including the fields API), this may allow us to solve the complex data issues as well. Being able to explicitly say that a field contains a list of strings (e.g.) would allow us to safely expose the values, and avoid potential data loss from JSON serialisation.

These changes will take time to finalise and execute, and it will be a while until the ecosystem fully adopts these changes. In the meantime though, we’d like to ship a REST API. Without these changes, we don’t have the ability to automatically expose plugin meta, however plugins can already register their fields manually, and future changes would simply provide better tools for developers.

We believe it’s in the WordPress project’s best interest to ship what we have and continue to iterate as we make these changes. Holding back the rest of the API for completion’s sake benefits nobody.


Thank you to the Meta Team at A Day of REST for volunteering to tackle this complex issue, and for their comprehensive discussion and planning. Thanks also to Brian Krogsgard for proofreading, and to Daniel Bachhuber, Joe Hoyle, and Rachel Baker for being generally awesome.

  1. We also had to gloss over exactly how progressive enhancement works, so I fleshed this out in a recent post if you missed it. []
  2. Objects implementing the Serializable interface instead use C: instead of O:, but these are not supported by WordPress for historical reasons. []
  3. “What about XML-RPC?” you may ask. The XML-RPC API only allows reading serialized meta, which is a minor privacy issue as it may expose internal implementation but is not a security issue. However, since serialized meta can’t be saved via the XML-RPC API, attempting to write the data you just read for a field will cause it to be saved double-encoded, which means it’s lossy. []
  4. Did you know there’s a system in core to register meta? I didn’t before we tackled this problem in the API, and that’s a key part of the problem. []
  5. This “groundwork” consists of expanding the scope of register_meta to take arbitrary parameters similar to register_post_type, plus promoting register_meta and making sure people know it actually exists. []

Progressive Enhancement With the WordPress REST API

In a REST API discussion today, we discussed the future of the REST API. Something I touched upon briefly in that meeting is the concept of progressive enhancement with the REST API. Since this topic hasn’t been brought up much previously, I want to elaborate on how progressive enhancement works.

Progressive enhancement is our key solution to a couple of related problems: forward-compatibility with future features and versions of WordPress, and robust handling of data types in WordPress. Progressive enhancement also unblocks the REST API project and ensures there’s no need to wait until the REST API has parity with every feature of the WordPress admin.

For instance, custom post types can do basically whatever they want with their data, so we wanted a robust system for indicating feature support via the REST API. For example, post types which don’t have the editor support flag won’t have content registered, similar to how the admin doesn’t show the content editor for those post types. In addition, plugins can do even crazier stuff like conditionally changing post types. The system in the REST API can handle these cases with ease, providing clients the ability to adapt on-the-fly to the format of the data they’re editing or displaying.

We also recognise that the REST API needs the ability to adapt to future versions of WordPress, and we want to avoid as many breaking changes as possible. Building the abilities for feature detection enables forwards-compatibility via progressive enhancement, and gives clients a reliable paradigm to safely check whether a WordPress supports a feature before trying to use it.

The progressive enhancement concept builds heavily on the model already used by browsers for this purpose. If you want to build a site that uses geolocation (e.g.), you can easily detect support for that and build while waiting for browser support, even including polyfills. Feature detection with the REST API can allow the same technique, and allow polyfilling while waiting for the long-tail of sites to update.

The interplay with the complexity of custom post types is almost a bonus here. If I’m building a replica of the post editor, using feature detection to select which “metaboxes” to show is basically a necessity. In the case of meta, clients need to be robust enough to do this already, as plugins can remove plugin support for custom-fields from the built-in post types, and clients need to respond to this.

Progressive enhancement exists in the REST API already, and is easily usable and accessible by clients that want to ensure robustness.

Building With Progressive Enhancement Today

As an example, let’s say I’m building a simple editor today that uses the REST API. Imagine essentially a slimmed down version of Calypso or MarsEdit.

My editor allows me to write posts, save them as drafts, edit them again later, and publish them when I’m ready. After the post is published, I can update and save, and that affects the live post. I can’t do post previews, as there’s no autosave support built in.1

For now, I build my client without the autosave support, and instead bake autosave features into the editor itself. The WordPress admin already does this with localStorage saving for offline connections, and this system doesn’t require server-side support.

Progressively Enhancing In A Future Release

In a future release, we have the autosaving process nailed down, so we mark our extra feature plugin as done and merge it into core. The autosave endpoint then gets rolled out in the next WordPress major release.

In my client, I want to add the extra server-side autosave support on top of my local autosaving. To do this, I look to see if the feature is supported on the site. In this case, the “feature” I want is the POST endpoint on the /wp/v2/posts/{id}/autosave route, so I check the index to see if the route is there and supports the method.

I see that the site supports the method, so my client transparently starts using server-side autosaves.

This feature detection already exists in the REST API today. For instance, compare the results of http://demo.wp-api.org/wp-json/ and https://wordpress.org/wp-json/ You can easily see which supports creating posts by inspecting for /wp/v2/posts and see that it supports POST.

Plugin Detection

REST API clients can also easily detect which plugins are available on a site. REST API endpoints are registered with two parts to their name: the namespace and the route. The namespace typically looks like name/v1 with a unique slug combined with the version; for the core plugin, this is wp/v2. This system works similarly to function namespacing in plugins currently, and we expect (and strongly recommend) that plugins and themes treat this as their unique slice of the API space.

Let’s say I want to check if WooCommerce is installed on a site. I simply fetch the index route and check the namespaces key to see if woocommerce/v3 is registered.

Again, plugin detection already exists in the REST API. Compare again http://demo.wp-api.org/wp-json/ and https://wordpress.org/wp-json/. The demo site supports the core endpoints, as it has wp/v2 registered, whereas wordpress.org only has the oEmbed endpoints.

More Granular Detection

We can far more granular with detection too. Each route supplies information about the schema that the response follows (and request data when creating or updating resources). This is available either via an OPTIONS request to the route, or by fetching the index with ?context=help.

To detect fields, we simply need to check the schema for that field. If generic meta support is pushed back to a future release, enabling clients to interact with this would be easy. For argument’s sake, let’s say it’s added as the custom_fields property on the post resource.

To detect feature support, we simply need to do an OPTIONS request on /wp/v2/posts/42 (for post 42), then check that $.schema.properties.custom_fields exists, and matches the format we’re expecting. We can then display a “custom fields” metabox-style interface in the editor for this.

Again, this level of feature detection already exists in the REST API today, and even more than that, we already recommend using this process for existing endpoints. When interacting with custom post types, you can detect whether the post type is hierarchical by checking for $.schema.properties.parent. You can detect whether a post supports reordering by checking for $.schema.properties.menu_order.

This applies even when not working with custom post types: you can detect whether a post supports featured images and whether the site/theme supports them by checking that $.schema.properties.featured_media exists. This isn’t a theoretical concern, robust editors already need to do this, as themes have differing support for WordPress features, and these changes need to flow through clients. In addition, plugins have essentially unlimited flexibility, and clients need to recognise this and support it in order to maximise compatibility across the long-tail of WordPress installs and configurations.

Meta with register_rest_field

One thing that was glossed over is that despite us pulling support for generic meta, we still have opt-in support for meta handling at a lower level.

If I’m a plugin author that wants to add my own data to a post response, I can simply use code like:

register_rest_field( 'post', 'rm_data', array(
    'get_callback' => function ( $data ) {
        return get_post_meta( $data['id'], '_rm_custom_field', true );
    },
    'update_callback' => function ( $value, $post ) {
        update_post_meta( $post->ID, '_rm_custom_field', sanitize_text_field( $value ) );
    },
    'schema' => array(
        'description' => 'My custom field!',
        'type' => 'string',
        'context' => array( 'view', 'edit' ),
    ),
));

Since I’ve registered the schema data, this is automatically added to the schema for me. Clients can then detect my feature automatically by checking for $.schema.properties.rm_data. The API here gives me feature detection for free. The proposal to enhance register_meta in core (https://core.trac.wordpress.org/ticket/35658) will enable even easier integration with the API.

Moving Forward

Right now, the REST API team, and the WordPress community, needs a clear path forward to get from the feature-plugin as it exists today to a sustainable long-term project. Being able to ship and move on is a key part of this, as well as providing the room to expand in the future.

We believe that the progressive enhancement approach is the best approach for continuing API development. Progressive enhancement is a paradigm the REST API project ​must​ adopt, if it’s an API we want to add to (without breaking backwards compatibility) over the next 10 years.

  1. I’m choosing autosave support here, however it’s very possible this will be completed and merged in the very near future. It’s a convenient example to use though. []

Introducing WP API

As many of you are aware, I was accepted into Google’s Summer of Code program this year to work on a JSON REST API for WordPress. WordPress already has internal APIs for manipulating data via the admin-ajax.php handler in addition to the XML-RPC API. However, XML can be a huge pain to both safely create and parse, and the existing admin API is locked down to authenticated users and is also tailored to the admin interface. The goal of this project is to create a general data API that speaks the common language of the web and uses easily parsable data.

I’d now like to introduce the official repository and issue tracker. There’s also the SVN repository which is kept in sync.

For the next few months, my schedule will be busy implementing the API. Each week from now through the final submission has an individual plan, presented below.

May 27: Acceptance of Project, ensure up-to-speed on existing code
June 3: Work on design documents (response types/collections) and ensure agreement with mentors and interested parties (#264)
June 10: Complete core post type serialisation/deserialisation (basic reading/writing of raw data complete) (#265)
June 17: Work on collection pagination and metadata infrastructure (the full collection of posts can now be accessed and is correctly paginated, allowing for browsing via the API) (#266)
June 24: Creation of main collection views (main post archive, per date, search) (#267)
July 1: Further work on indexes and browsability (#268)
July 8: Create (independent) REST API unit tests for all endpoints covered so far (#269)
July 15: Creation of a Backbone.js example client for testing (#270)
July 22: Spare week to act as a buffer, since some tasks may take longer than expected
July 29: Midterm evaluation!
August 5: Creation/porting of existing generic post type API with page-specific data (#271)
August 12: Creation of attachment-related API (uploading and management) (#272)
August 19: Creation of revision API, and extending the post API to expose revisions (#273, #274)
August 26: Creation of term and taxonomy API (#275)
September 2: Finalisation of term and taxonomy API, and updating of test clients (#276)
September 9: Final testing with example clients (especially with various proxies and in live environments) and security review (#277)
September 15: Spare week for buffer
September 22: Final checking for bugs and preparation for final submission

At the end of each week throughout development, I’ll post a weekly update and tag a new release version, in a manner similar to the release process of MP6. The first release of the API will be posted shortly.

For those looking to keep track of development, I’ll be posting about the API here, which you can follow via the feed. A GSoC P2 is on its way and will be the official place to post comments and feedback (I’ll be crossposting back to here once that’s up). In the mean time, I’ll be posting on this blog and accepting comments here, which is a great way to ask questions and post feedback.