How WordPress Knows What Page You’re On

In the spirit of Dan Abramov’s Overreacted blog, where he deep-dives into React on his personal blog, I thought I’d do the same for WordPress. If there’s something you’d like to see, let me know!

Since WordPress 1.0, WordPress has supported “pretty permalinks”; that is, human-readable permalinks. This system is built for a lot of flexibility, and allows users to customise the format to their liking, using “rewrite tags”.

Screenshot of the WordPress permalinks screen, showing the presets as well as custom input with "available tags" buttons

Pretty permalinks is implemented through the Rewrite system, but how that works can be a bit obscure, even if you’re familiar with it.

“Rewrites”, for those who aren’t familiar, are how WordPress maps a pretty permalink to something it can use internally. This internal representation is called a “query” (which is a bit of an overloaded term), and is eventually used to build a MySQL query which fetches the requested posts from the database.

This “query” is not exactly the same as what you might think of as a query in WordPress. It’s a mix of parameters used in WP_Query to get posts (called “query arguments” or “query args”) as well as information about the request itself (called “query variables” or “query vars”). Query vars are typically only useful for the main query, and include routing information like whether a 404 error occurred. This will hopefully be clearer later.

Let’s step through, chronologically, how WordPress handles turns your request into this query.

Aside: WP_Rewrite?

If you’re a seasoned WordPress developer, you might know Rewrites through the WP_Rewrite class. But perhaps surprisingly (or not, if you know how WordPress has evolved), rewrites are actually handled in the little-known WP class instead. Additionally, some (in fact, many) URLs and patterns are routed outside of regular rewrites.

We’re going to take a look at the whole process from where it starts, not just WP_Rewrite. The rewrite process really begins as soon as WordPress starts handling the request.

Bootstrapping

Before WordPress can get started with anything, it needs to first bootstrap everything. How this general process works is a topic for a different day, so I’ll just talk about the relevant bits here.

The key steps in the bootstrap process are:

Already during the bootstrap process, there are a few places where redirects or full requests can be served back. The most common case with full-page caching enabled is that the cache will serve back a request using its own routing. The other cases are mostly error cases, with the exception of multisite, which I’ll cover later.

Note that all of these cases happen before the Rewrite system is started, so it’s not possible to use rewrites to handle favicons, multisite routing, or caching. This is all by design, as these checks have to run early either for performance or to check for basic bootstrapping errors.

You can however use the various hooks provided in the bootstrapping process to handle these requests, if you register your callbacks before wp-settings.php is loaded. You can also handle it in your wp-config.php; don’t forget that’s just PHP, so you can run whatever code you want there.

Initialising the Routing

After the basic bootstrapping in WordPress is done, we get into the actual routing instantiation. Firstly, WordPress instantiates the critical routing classes (WP_Rewrite and WP).

Instantiating WP_Rewrite fires off rewrite initialisation. This loads in all the various settings and sets properties that can later be used for rewrite generation. This also includes setting the “verbose page rules” flag, which is used when your permalink structure contains one of a few specific tags: those which start with slugs, and would potentially cause pages and posts to have conflicting permalinks. Verbose rules change how routing happens later, causing WordPress to “double-check” the URL during routing.

Before WordPress 3.3 (specifically, #16687), verbose page rules caused one-rule-per-page to be generated, which (needless to say) wasn’t great for performance on large sites. This was changed to instead check only when necessary.

Once this done, our oft-forgotten friend wp-blog-header.php kicks off the actual routing. This runs WP::parse_request which is where the actual routing in WordPress is (generally) done. Basically the first thing this does is to load in the “rewrite rules”.

Generating the Rules

Before we can start doing any routing, we need to convert the user settings to something we can actually work with. Specifically, we need to generate the rewrite rules.

Rewrite rules are essentially a gigantic map of regular expression to “query”. For example, the basic rule for categories looks like:

'category/(.+?)/?$' => 'index.php?category_name=$matches[1]'

If you’ve ever used any other routing in pretty much any web framework, you might wonder what the hell the thing on the right is. This is a WordPress “query string” (which is not the same thing as WP_Query). Essentially, all “pretty” permalinks in WordPress map to this intermediate “ugly” form first, before then being mapped into a WordPress query. This ensures compatibility with sites that don’t support pretty permalinks, but means that WordPress doesn’t directly support “rich” routing (such as callbacks, complex queries, etc).

To generate these rules, we go back to the WP_Rewrite class, which attempts to load cached rewrites from the rewrite_rules option, and generates it if it is not available.

Building a Set of Rules

There are many sets of rewrite rules that are generated, and each is generated from a “permastruct” (for “permalink structure”) and an “endpoint mask”. The permastruct specifies the general format of the set of rules to generate, and the “endpoint mask” controls which suffixes (“endpoints”) are added to the permastruct.

A permastruct is a string with static parts and “rewrite tags”. Rewrite tags look like %post_id% and represent a dynamic part of the rewrite rule. WordPress contains a few built-in permastructs: “date”, “year”, and “month” for date archives; “category”, and “tag” for the built-in terms, “author” for author archives; “search” for search results pages; “page” for static pages, “feed” and “comment feed” for RSS/Atom feeds. It also has the main permastruct for single post pages, and “extra” permastructs as registered by plugins or themes.

The permastruct is combined with an endpoint mask, which is a bitmask specifying which additional rules to add to the main endpoint. WordPress includes 13 endpoint masks, plus 3 helper masks (EP_NONE, EP_ALL, and EP_ALL_ARCHIVES). These can be combined with bitwise operators (|, &, ~) to activate multiple endpoint masks at once.

Endpoint masks are very confusing for those unfamiliar with bitwise operations, so you typically don’t see them used much outside of WordPress core’s routes. Also, they’re not very extensible, as custom endpoint masks will conflict with each other. Avoid doing anything special with these, and generally follow existing guides on how to use them. Jon Cave’s post on Make/Plugins is the best way to understand them if you really want to get into it.

The permastruct and endpoint mask are passed to WP_Rewrite::generate_rewrite_rules(), which replaces the rewrite tags with their regular expression equivalents. It does additional parsing to then generate additional rules based on which rewrite tags were used, and using the endpoint mask. I won’t go into the specifics of this, as this is optimised code with lots of weirdness, but suffice to say it converts the parameters into an array of rules.

For example, the main post rewrite rules are generated using the user-set permastruct with the EP_PERMALINK endpoint mask. This takes the rewrite_rules setting as the permastruct (which looks like /%post_id%/%postname%/). generate_rewrite_rules() turns this into rewrite rules to match things like post attachments, feeds, pages (as in, paged posts), comment pages, embeds, and the combination of all of these.

Collecting all the Sets

WordPress repeats the rewrite generation for each set of permastructs it knows about (plus the “extra” permastructs added by plugins or themes), and it then combines them into a single set of rules. It also adds in some additional static rules (for things like deprecated feeds and robots.txt). It runs a couple of filters to allow plugins and themes to add other static rules as well.

Extra permastructs are typically generated by core helpers like register_post_type() or register_taxonomy(). Plugins don’t typically add permastructs manually, as the generation makes a lot of assumptions about things you want.

Once all of this is done, WordPress saves the rules into the rewrite_rule option to avoid having to regenerate them on the next request. However, if a plugin has flushed the write rules before wp_loaded, this saving is deferred to wp_loaded to ensure plugins don’t break the whole system.

Now that we know we have rewrite rules (whether loaded from the option or generated fresh), we can finally get around to routing our requests.

Matching the Rules

Back in WP::parse_request(), we now have the full rewrite rule array ready to use. First, we set up and normalise the incoming request on top of the stuff already done during bootstrapping. This includes removing any path prefixes if WordPress is installed in a subdirectory (or if we’re on a subdirectory site in multisite).

Root requests (i.e. for /) are normalised to the empty string (''), and matched directly to the '$' rule, which improves performance for one of the most commonly-requested pages on the site. (As '$' is also (typically) the last rule in the rewrite array, this also saves us running potentially hundreds of regular expression checks that will never match.)

All other requests go into the main matching loop. This loop takes every rewrite rule and attempts to match the regular expression against the requested path (twice, in case the URL needs decoding). If the rewrite rule matches, the “query” for the rule is stored, and the loop breaks (as only one rule can match). If no matches are found, $wp->matched_rule remains unset.

If verbose page rules are set and the “query” contains the pagename query var, the loop first checks to see if the URL actually matches a real post. (It also checks that the post is public to ensure drafts aren’t accidentally exposed via their URL.) This check allows multiple post types to have overlapping rewrite rules, and means that potentially multiple rules can match a single request.

If a match is found, WordPress then parses the URL using the “query” string from the rule. This transforms a URL like /42/my-post/ into an array of query vars like [ 'p' => 42, 'name' => 'my-post' ]. This transformation is done using regular expressions which understand how to turn $matches[1] into the first item of the rule’s regular expression result.

This parser is used to maintain backwards compatibility with the older “parser”, which simply used eval() to parse the “query” into query vars.

WordPress also checks if the current request is for wp-admin/ or for a direct PHP file, and resets the query vars if so.

At this point, we’ve converted the requested URL into query vars, so the main part of the routing is done. All that’s left is to check that the query vars are allowed to be used, combine in $_GET (query string) and $_POST (data from the request body) variables, and apply some light sanitisation. Further permission checks and cleanup is also done to ensure everything is fairly normal. If any errors occurred, the error query var is also set to enable it to be handled later.

Using the Query Vars

With the query vars all set and established, WordPress now starts using them. It does error handling based on the error query var as part of sending headers, and bails from the request if specific errors were hit (403, 500, 502, or 503 errors). It turns off browser and proxy caching for logged-in users, and sends various caching headers for feeds, and sends the HTML Content-Type for everything else.

All the other query vars are passed as query arguments to WP_Query, and this sets the “main” query. After this is done, 404 requests are sent if WP_Query didn’t manage to find anything (with some conditions on that). If a 404 occurred during routing, WordPress checks this when parsing the query vars, and sets the internal 404 flag.

The specifics of how querying and rendering the results are done is out of scope for this explanation, but has been explained to death elsewhere, as you’ll actually need to interact with this in plugins and themes.

Special Cases

Multisite

While rewrite rules handle matching requests inside a site, a different system is using for matching requests to sites first. This is for a few different reasons: rewrite rules can be changed by plugins, which are site-specific; site data needs to be loaded first for rewrite settings; and multisite routing uses both the domain and the path.

Multisite routing is kicked off when ms-settings.php is loaded in wp-settings.php. The routing first loads sunrise.php, which traditionally handled “domain mapping”; that is, routing external domains to sites. WordPress 3.9 enabled doing this internally in WordPress by simply setting the site’s URL to the external domain, but plugins are still required for multiple domains. (The sunrise file can also be used for many other purposes, but routing remains one of its main purposes.)

If the sunrise process did not handle the routing, WordPress normalises the host and path, then uses this information (along with the SUBDOMAIN_INSTALL flag) to try and find the current site. The mechanisms by which it does this are fairly readable, so I’ll leave it as an exercise to the reader to look into this: simply read and follow the source of ms_load_current_site_and_network().

Once the site has been routed, the site’s details are loaded into relevant global variables. This includes the site’s URL (home_url()), which is later stripped during normalisation in WP::parse_request() (see “Matching the Rules”). This ensures that any path for the multisite install is not used when matching rewrite rules.

REST API

The REST API uses its own routing and endpoints for a few reasons. Unlike regular WordPress requests, the REST API does not always generate a “main” query, so it does not need the query var mapping system. Additionally, REST API “endpoints” (no relation to “endpoint masks”) are matched using both the HTTP method (typically GET, POST, PUT, or DELETE) and the path, unlike regular WordPress rewrites, which are method-agnostic.

The routing inside the REST API is much more similar to traditional routing in non-WordPress contexts, and it matches the pair of HTTP method and path to a callback rather than a query.

To bootstrap the process, the REST API registers rewrite rules which match /wp-json/(.*) to a custom query var called rest_route. After the rewrite system has matched the request URL to this rewrite rule (on parse_request), the REST API checks this query var. If it’s set, it initialises WP_REST_Server, and handles the routing inside WP_REST_Server::serve_request().

The API first sends some setup and security headers, then does some further setup for JSONP responses. It then initialises a WP_REST_Request object. This object contains all the data about the request, and allows the API to be re-entrant: that is, you can run multiple REST requests in one WordPress request, because all the “global” information is contained in this object. The API then checks that no errors occurred during authentication, and if everything is good, it then “dispatches” the request.

WP_REST_Request::dispatch() runs a similar routing loop to WP::parse_request(), but without special cases for verbose rules. Unlike rewrite rules, each route can have multiple “endpoints” (i.e. callbacks). If the route matches, the API loops over each endpoint (called “handler” in the code) and checks whether the method for the endpoint also matches.

If it matches, the callback is then called, with some other stuff around it. Exactly how these requests work is a topic for a different post, as the API does a lot of special handling around this.

Once the callback has been run, the end result is a WP_REST_Response object. This object contains the response data as well as any headers or status code to send back to the client. Headers are then sent back to the client before encoding the response data as JSON and finally echoing it to the client. Back in rest_api_loaded(), the WordPress request is now finished off, ensuring that further routing/handling in the WP class is skipped.

Limitations

The design of Rewrites is classic WordPress: it maintains wide compatibility, both forward and backward, through clever and careful design. There’s much to like about this system, but the core feature of mapping “pretty” permalinks back to “ugly” permalinks is very smart. This makes compatibility between the two inherent, and it ensures new code is automatically compatible.

The biggest problem is that Rewrites is inherently tied to post querying. To be clear, this is not a problem with Rewrites, but rather with the overall design of the frontend system in WordPress. This makes routes not tied to posts much more difficult to design and implement. While this worked well for the original, blog-focussed nature of WordPress (where essentially everything was simply a filtered post archive), it has been stretched to its limits as a modern CMS.

This is evident in the REST API, where posts are no longer the main content type, and anything (users, themes, the current theme) in WordPress is addressable via a URL. When I designed the REST API’s routing, it was with these limitations in mind, which is why it uses a completely custom router. This router also works by “skipping” the main query, which it actually does by exiting before queries and templates are loaded. This is workable for a separated system like the API, but isn’t a good idea if you want to instead design user-facing pages which actually use templates (say, for a checkout page).

Understanding Rewrites can also be tough if you don’t know where to start, which is why a lot of people miss key parts or don’t quite understand the flow. A significant part of this is the organic way in which the WP and WP_Rewrite classes have grown, which means that understanding the flow requires a lot of flicking back and forth. I’d wager that quite a lot of WordPress developers don’t even know the WP class exists and acts as the main engine of the request; I didn’t until I really dug into Rewrites while working on core.

So Much More

There’s a lot more that happens that I didn’t cover here, so let me know if you want to see any more detail on anything specific. Just knowing where to start can be challenging some times, particularly with these systems that have organically grown.

Also, if there’s anything else you’d like to see a breakdown of, let me know! I’d like to demystify more of WordPress if you found this useful.

One thought on “How WordPress Knows What Page You’re On

Leave a Reply