How to Detect WordPress Websites

Have you ever wondered if there are ways to detect WordPress websites? Or in other words: are there any reliable indications of an arbitrary website being powered by WordPress? And if so, how would these look like?

This post is about exactly that.

The reason for this is explained in my post about scanning the Internet for CMS websites. There is also an according post for Joomla.

Markup

The first thing that comes to mind is analyzing the actual markup of a particular website. And yes, there are quite a few things that might give away whether or not the website is a WordPress website.

Generator meta Element

The easiest catch is the generator meta element. Just like Drupal and Joomla, WordPress by default renders an HTML element with meta data about the software used to generate the document (i.e., WordPress 😉 ). The according markup could look like something from the following.

Self-hosted WordPress website:

<meta name="generator" content="WordPress 4.7.2" />

WordPress.com website:

<meta name="generator" content="WordPress.com" />

If you spot any of these somewhere, you can assume the website is powered by WordPress.

However, the generator is not an absolute truth as it is easily possible to either filter (i.e., manipulate) the output, or not render the meta element at all.

Assets

Analyzing the asset files referenced in the markup might also hint at WordPress. Common path fragments are, for example, the following:

  • /wp-content/
  • /wp-includes/
  • //(api|s).w.org
  • //(pixel|s0-s9|stats).wp.com
  • //wp.me/

In the above list, the expression (this|that) means either “this” or “that” (i.e., any of the included words separated by a pipe, |). I did not use regular expressions because people might not be used to these anyway. Valid file paths included in the given patterns are, for example, //api.w.org or //stats.wp.com.

Of course, finding one or more of these path fragments in the markup does not necessarily mean the respective website is powered by WordPress, although you can assume so. Also, there might be WordPress websites that use a different name for the user content (e.g., /content/ instead of /wp-content/), and/or have WordPress in a subfolder, for example, /wordpress/wp-content/ or even /wordpress/content/.

body Classes

Unless filtered, most WordPress websites have several of the following HTML classes assigned to the body element, by using the body_class() function.

Latest posts on front page:

  • home
  • blog

Static page as front page:

  • home
  • page
  • page-id-*
  • page-template-*

Just like with the assets, these class names are no hard evidence. There are WordPress websites that do not have any of the classes, and there are also lots of non-WordPress websites that have one or more of them. However, if you happen to find a combination of the above classes this might mean the website is powered by WordPress.

Deep Link URLs

The previous section was not aimed at specific pages or files, but rather at the front end in general. In addition, there are several files—or folders, but in the end it is some index file anyway—that might help in detecting WordPress websites.

Admin

The WordPress administration screens can usually be reached under /wp-admin/. If the request to this URL is successful—this means you are on the login page—you can do some further investigation. One thing is to check the path of the included assets. In addition to files referenced on the front end, the login page also might use admin-specific asset files. Path fragments you might want to test are described by the following pattern:

  • /wp-admin/(css|images|js)/

Classes on the body element are, by default, these:

  • login
  • login-action-login
  • wp-core-ui
  • locale-*

In the markup, the default login page has the following HTML code (fragments) that you can test for:

  • /wp-login.php
  • <input type="text" name="log" ... />
  • <input type="password" name="pwd" ... />
  • <input name="rememberme" type="checkbox" id="rememberme" value="forever" />
  • <input type="submit" name="wp-submit" ... />
  • <input type="hidden" name="redirect_to" ... />
  • <input type="hidden" name="testcookie" value="1" />

Includes

In the wp-includes/ directory, there are several files that you can access (e.g., CSS, JavaScript and image files). Almost the only interesting one, however, is wp-includes/wlwmanifest.xml, the Windows Live Writer manifest. In it, you may find references to “WordPress” and /wp-admin/.

License File

In the license file—which lives at /license.txt, if available—there are several references to “WordPress” as well as a link to https://wordpress.org/download/source/.

Links File

Unless the file has been manually removed, requesting /wp-links-opml.php will provide an OPML file of the blogroll (i.e., all links). By default, there is also a generator comment included, which can easily be removed from the output, though.

Self-hosted WordPress website:

<!-- generator="WordPress/4.7.2" -->

WordPress.com website:

<!-- generator="WordPress.com" -->

Readme File

In the readme file—which lives at /readme.html, if available—there are looots of references to “WordPress”, and several links to WordPress (admin) files (e.g., wp-admin/install.php).

REST API

As of WordPress 4.4, you can make use of the built-in REST API, which you can find at /wp-json/. If this is a WordPress website, you should get a JSON response in which you can find a link to wp-api.org.

RSS Feed

By default, WordPress responds with an RSS feed when requesting /?feed=rss (e.g., https://make.wordpress.org/core?feed=rss). Of course, there might as well be other content management systems that provide an RSS feed for /?feed=rss, so a successful request is no proof that the website is powered by WordPress.

However, most of the feeds come with a generator tag, like so:

<generator>https://wordpress.org/*</generator>

Or in case of a WordPress.com website:

<generator>http://wordpress.com/</generator>

If you happen to find this in a feed response, you can safely assume the website to be powered by WordPress.

You can also try to request, for example, /wp-atom.php, which will result in the according Atom feed, if this is a WordPress website. All of the different feeds can be access via query string or /wp-<TYPE>.php., with <TYPE> being the feed type. 🙂 Using the latter serves as extra check for WordPress, just because other CMS websites might allow to define the feed type via query string.

XML-RPC

Unless the file has been manually removed, requesting /xmlrpc.php?rsd will respond with a Really Simple Discovery (RSD) file, an XML file that contains information on how to interact with the website API-wise. The interesting parts are as follows:

<engineName>WordPress</engineName>
<engineLink>https://wordpress.org/</engineLink>
<apis>
    <api name="WordPress" blogID="1" preferred="true" apiLink="*/xmlrpc.php" />
    <api name="WP-API" blogID="1" preferred="false" apiLink="*/wp-json/" />
</apis>

So, there are several references to “WordPress” and wordpress.org, and maybe “WP-API” (as of WordPress 4.4).

HTTP Headers

As for Joomla, HTTP headers might give away if a website is powered by WordPress. Until WordPress 4.4, there was a shortlink header, meaning a Link header with the value <*?p=*>; rel=shortlink. This is only sent if the post has a shortlink defined—this feature got disabled and hidden by default with WordPress 4.4.

However, with the release of WordPress 4.4, the REST API started to send a Link header containing the URL of the REST API endpoint, which is <*/wp-json/>; rel="https://api.w.org/".

When requesting /wp-login.php, however, there are further interesting headers. First of all, there is a test cookie set on the login page, so you will find a Set-Cookie header with the value *=WP+Cookie+check; path=*. Secondly, the login page is a page that is not to be cached, so there are the no-cache headers sent. And one of these is an Expires header with the (default, but filterable) value Wed, 11 Jan 1984 05:00:00 GMT, which happens to be the birthdate of Matt Mullenweg. 🙂

Of course, both is no good when you do not know where the login page is or you cannot access it (e.g., due to WordPress being in a subfolder and/or securing the login page with a token). But this is no problem, you just have to create another situation where the no-cache headers are sent. One of these is a 404 error page. For WordPress, you can trigger one by requesting /?pagename=/. There is no page with the given name (or slug), just because it cannot include a slash, so there is nothing found matching the request. Hence the 404, and the Matt Mullenweg birthday header.

In addition, there might be (popular) WordPress plugins that add one or more relevant headers indicating the according website is a WordPress website. If you happen to know good examples, please tell me. 🙂

In case you want to try this out yourself, you can easily display a website’s HTTP headers in your browser.

Something Else?

Did I miss something here?

Are there any things that lack mentioning the necessary circumstances?

Please, let me know.

4 Comments

  1. While this list is already pretty comprehensive, I may also mention the template header in the stylesheet (which you can easily look into with the browser’s developer tools, given the stylesheets weren’t concatenated and comments removed, as plugin Autoptimize and some cache plugins do).
    Your list also shows, that (despite the often published hint to “improve” security by obfuscation) it doesn’t add any protection to hide the generator tag alone. There’s nothing wrong in (proudly) showing, you are using WordPress, if you keep updating your stuff.

    1. Hi Bego,

      while you are right that the main CSS file of most themes will contain typical WordPressy structures, this doesn’t help so much. Thing is, if you were unable to identify WordPress from the markup and all that, this means you don’t know where the user content lives (i.e., what is by default /wp-content/). This again means you would have to check every single CSS file, right?

      There’s nothing wrong in (proudly) showing, you are using WordPress, if you keep updating your stuff.

      Word! 😀

      Thanks,
      Thorsten

  2. Great Stuffs.. Can we use some plugin to hide WordPress footprint from detecting?

    1. In general, no. It’s just not possible to completely hide WordPress. But, like mentioned in the previous comments, it’s nothing you really have to strive for. I mean, you aren’t manipulating your car (e.g., replace your Jaguar statue on the front with some … kitten 😉 ) just because so potential car thieves will have no idea what car this is…? 😀

      Keep your stuff up to date, rely on trusted (or at least well-known) sources for plugins and themes, and use common sense.

Leave a Reply

Your email address will not be published. Required fields are marked *