by Atolyricz Lyrics
From a technical SEO perspective, Wikipedia is perhaps the greatest of all-time (GOAT). Each month, it receives billions (yes, billions!) of organic visitors across its 46M+ articles in 300 languages.
It’s trusted by millions, and the Wikimedia Foundation deserves credit for implementing quality assurance, such as Featured Articles, and the Good Articles nomination process, to kaizen the academic accuracy of its content.
Perfecting an open-source encyclopedia takes time, as does perfection of technical SEO for large enterprise sites. If not for robust technical SEO, search engine spiders can’t crawl, index, and rank your thousands or millions of pages properly.
While filthy-rich content and high site authority play a significant role in Wikipedia’s SEO dominance, I argue their technical SEO plays the most important role, allowing it to rank for almost every informational keyword at the top or on page 1.
This is a tribute post that extracts lessons and praises Wikipedia’s platform for its SEO mastery, which also powers Wikipedia’s sister Project sites, such as Wikidata, Wikinews, and Wiktionary.
We’ll study various technical SEO techniques that’ve helped Wikipedia rank at scale on page one on desktop, and on mobile:
- Domain Setup / Internationalization
- Sitewide Links / HTML Layout
- Meta Descriptions
- Page Templates / URL Formats
- Site Architecture
- Mobile-First Indexing
- Page Speed
- The Single Most Important Enterprise SEO Success Factor
Domain Setup / Internationalization
Wikipedia’s mission is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally (source).
Or in other words, to make information freely available to everyone.
The role of SEO on this mission: to help every single person in the world find and comprehend information on any subject in any spoken language. This is where a robust and scalable, international SEO strategy comes into play.
You can localize content in multiple languages using sub-domains and sub-directories on a single domain, or you can set up websites on multiple domains regionally using country code Top-level Domains, such as Wikipedia.in (India), Wikipedia.fr (France), and Wikipedia.gr (Greece) for example.
Between sub-domains, sub-directories, or ccTLDs, either of the 3 approaches can help you dominate international SEO if done correctly.
While Wikipedia sporadically uses ccTLDs, it primarily goes the sub-domain route to excel in ranking worldwide.
The domain wikipedia.org is set up to support 300 desktop sites on sub-domains:
- www.wikipedia.org (the global homepage)
- En.wikipedia.org (English)
- Fr.wikipedia.org (French)
- And another 298 language sites
It’s also set up to support 300 mobile (m.) sites on sub-sub-domains, such as en.m.wikipedia.org.
The global home page www.wikipedia.org is the starting point. It invites bots and humans to browse its most popular Wikipedia sites by language.
Or, you can browse its family of Project sites:
The endless river of link juice flows into Wikipedia and is evenly distributed to ~320 Main pages of Wiki sites. Here are these link metrics, via Ahrefs:
As you get into the Main page of each sub-domain, for example, the English site’s home page, you notice that Wikipedia continues linking to the equivalent Main page in foreign languages.
Every subsequent page, including the millions of Wiki articles, contains a link section where dynamically generated links point to the corresponding article in all other available languages.
Here’s the English page for The Office.
Google suggests you annotate links to articles in foreign languages by using the hreflang and rel=”alternate” tags in the HTML link element in the header, in HTTP headers, or in Sitemaps.
Wikipedia may use their Sitemaps, but they certainly don’t use the headers to point to multilingual content.
The most obvious way Wikipedia annotates links is by marking up the actual anchor tag of each link. From the English page for “The Office,” here’s an example of a link to the Spanish version:
<a href="https://es.wikipedia.org/wiki/The_Office" title="The Office – Spanish" lang="es" hreflang="es" class="interlanguage-link-target">Español</a>
You can get a deeper understanding of multi-lingual SEO from this article: Multilingual SEO: Translation and Marketing Guide.
Sitewide Links & HTML Layout
Wikipedia codes their pages’ templates beautifully for spiders by prioritizing content and internal links in 2 ways:
- It doesn’t place any “important” links (i.e., to pages with a lot of SEO traffic potential) in the
- It places sidebar links towards the bottom of the source code in the
But why are these two things beneficial from a technical perspective?
Well, Google has stated that both sidewide and footer links are not given much weight—that is, they pass less link-equity (or PageRank). This is likely because such links are deemed to be less important for users. After all, when did you last click a link in a website’s footer? I can’t remember the last time I did.
But Wikipedia has another trick up its sleeve…they add a kind-of additional “sub-footer” section to each page. This contains a bunch of dynamically-generated links related to the overall topic of the page.
Because this sub-footer is dynamically generated for each page, none of the links are sitewide. Therefore, they don’t get devalued by Google.
And it’s a similar story with the links in the left sidebar.
Most links in the sidebar are for editors and users (i.e., for navigational purposes). And the sidebar is sitewide, so it makes sense not to include any important links (in terms of SEO) in this section.
But again, Wikipedia goes one step further…
Their page is coded in such a way that the sidebar HTML is placed towards the bottom of the source code (it’s still in the
<body> section, but right at the bottom). This allows contextual links at the top of the page to receive more link equity, while the sidebar links are further demoted.
Lesson: Place links to rank-worthy target pages at the top of each page contextually, and using rich anchors. Place second-tier links in sidebars below your contextual links in the page source.
What meta descriptions? Wikipedia leaves them blank.
Wikipedia uses title tags in the header but ignores populating descriptions, which goes against all standard SEO advice. Not gonna lie, if I was consulting them I’d give them the same advice and plead my case, too:
“You should have a keyword-rich description, with at least one call to action, to entice high click-through rates, which could indirectly lead to higher rankings. Meta descriptions are easy to implement across the entire site with a template like this:
Learn more about [Wiki topic], or join over a 100,000 contributors and add your own knowledge and expertise about [Wiki topic].”
To which I’d evoke the Michael Scott death glare:
I’d be wrong to give this advice. Creating a generic, one-size-fits-all description template or even encouraging contributors to update the meta description by hand doesn’t make sense for Wikipedia. Every Wiki article fits thousands of search queries.
Long-form Wiki articles like this one about the Sun, rank for over 4,900 keywords per Ahrefs Organic Keywords report. The best thing to do is leave the description blank, and let Google figure what snippets to display for any query.
Every Wiki article starts with the topic in bold and a simple sentence structure, clearly answering the who, what, where, or when, just as a meta description would do anyway:
This formatting could also help Wikipedia potentially improve its chances for featured snippets.
Here’s the search result for “What is the Sun?”
The Wiki page:
Google does a decent job of extracting relevant descriptions for other related searches. Here’s the search result for “what type of star is the sun?”
Search result for “age of the sun.”
Page Templates & URL Formats
Wikipedia predominantly uses just one page template and URL format to rank pages — the Wiki article.
99.95% of the top (100,000) organic landing pages are Wiki articles, per Ahrefs Top Landing Pages, residing in the /wiki/ sub-directory.
Even its sister sites use the same URL format and single page template to structure its core content. Examples:
- Wikimedia Commons:
If you dig into Wikipedia’s website architecture, here’s what you will find.
While Wikipedia uses other page templates, such as Portals, Categories, Lists, as well as editor-friendly pages you see in the left sidebar, these pages exist for secondary navigation and general information. These pages rarely rank for any search terms.
For example, Portals are topic pages that exist as additional entry points from the home page, such as Geography. A Portal seemingly exists for editors to click into the topics they’re interested in contributing. For bots, it’s like an index sitemap welcoming search engines into the world of all the Wikis.
Geography is both a Portal and a Wiki. Guess which one ranks better?
As an SEO, you wonder “shouldn’t the Portal Geography page outrank the Wiki, as it’s one click away from the home page, the most authoritative page, and it contains good unique content?” The reasons are likely as follows.
While the Portals link to the Wikis, Wikis don’t typically link back to Portals. And all Geography Wikis link back to the Geography Wiki so overall, in terms of URL rating, the Wiki Geography page is stronger than the Portal Geography page.
In fact, the Geography Portal page URL Rating is 40 and it ranks for zero organic keywords.
But the Geography Wiki page URL Rating is 73. It ranks for 2.5K+ organic keywords.
Lesson: Both prominence and quantity of internal links determine which of 2 pages with similar content and the same target keyword outranks the other. Linking a page high up in the site hierarchy — even pages just 1 click away from the home page — doesn’t guarantee good rankings.
If Wikipedia pointed more links to the Portal Geography page from all relevant Wiki Articles, perhaps by using breadcrumbs, that page would likely beat the Wiki Geography page.
The primary way to navigate Wikipedia is to use its on-site search, or by clicking from Wiki to Wiki. Wikipedia’s contextual linking makes it easy for bots and users to browse the site. While the secondary navigation shown above uses a rather deep structure, the primary navigation comprises of a beautifully designed flat site architecture.
The five front-and-center content sections feature and constantly rotate timely or random Wiki articles. These articles contextually link to other related Wiki articles, and so on. Wikipedia doesn’t use mega menus or faceted navigation, as it doesn’t use a top-down categorization structure. It’s only 2 levels deep.
As an SEO, it’s perplexing to see that an encyclopedia with millions of articles that can easily follow a categorization structure like this, refuses to:
You wonder how Google categorizes all this content and indexes it in neatly-ordered taxonomies. I mean, Wikipedia doesn’t use even use breadcrumbs, so how’s Google to create parent and child relationships of categories with articles?
Well, what Wikipedia teaches us is that child and parent relationships don’t matter if your contextual internal linking is super-relevant, abundant, and free of wasteful links (404s, duplicate/thin content pages, etc.).
Basically, Wikipedia treats every topic (categories, subcategories, and sub-sub-categories) as a Wiki article, and interlinks them all contextually.
In comparison, Encyclopedia.com’s top-down structure requires 4 clicks from the home page to get to an article. So they have to turn to faceted navigation and breadcrumbs to help reinforce the parent-child categorization.
Internal Link Proximity
The Six Degrees of Separation is the idea that any person is connected to any other person on the planet by no more than 5 intermediary acquaintances.
Likewise, on Wikipedia, it takes on average only 4.5 clicks to get from a Wiki article to any other Wiki article.
One of Wikipedia’s greatest software functions is the ability for editors to easily cross-link to Wikis. Within the body of each article, you’ll notice that editors tend to hyperlink almost all concepts or subjects to the matching Wiki article. If you’re not using menus and breadcrumbs, this is the only way possible to establish strong link relationships across a site with millions of pages without using automatic linking software.
Internal & External Link Counts per Page
Moz, for example, suggests you keep your links at roughly 150 links per page. Matt Cutts suggests keeping them at 100 so that you don’t overwhelm users with a poor experience. It’s widely believed, and for a good reason, that excess links on a page hurt PageRank distribution, and don’t do users any good. Most sites should stick to the 150 or below threshold.
Wikipedia wishes it could, but can’t.
The US page for The Office contains over 2,300 links.
- ~100 sidebar links for editors and users to pages containing little to no value in ranking for non-brand terms.
- ~225 (10%) external links and citations — the infamous ‘nofollow’ links SEOs love debating over.
- ~150 links to foreign language versions of the Wiki article.
These make up roughly 20% of all the links on the page living at the bottom of the lower priority section in the HTML.
The remaining 1,800 (80% of all) links are jump links and contextual links with rich anchors, prioritized in the top of the section in the HTML.
Rand Fishkin suggests that Google weighs links higher in the HTML with more weight than those lower in the HTML. I still believe this works today as an evergreen internal linking tactic.
Search Engine Crawls
Question: When Googlebot starts crawling Wikipedia, does it ever finish?
Googlebot web crawls are determined by some combination of so-called domain authority and individual page authority (PageRank), frequency and prominence of internal links, URL prioritization (via Sitemaps for example), and content updates. Considering Wikipedia nails all of these factors, what does a typical Wikipedia site crawl look like?
While it might delight any SEO professional to take sneak peek at Wikipedia’s server logs or its Webmaster Tools Crawl Stats, it’s not publicly available information. What is publicly available is this little-known traffic statistics tool by WFMLabs where you can see all kinds of interesting pageview stats at a page level or at a Project level. You can even see search engine spider crawl activity by Project dating back to July 2015:
While not specific to Googlebot, all search engine crawlers average over 40MM pageviews per day. In comparison, humans average over 250MM pageviews per day.
How does this compare to the number of pages search engines crawl on your site? If you’re not already doing so, check your Webmaster Tools Crawl Stats, and for deeper analysis, try to regularly review your server logs. For most sites, you can use a tool such as the Screaming Frog Log Analyzer.
Mobile First Indexing
It’s only a matter of time until Google rolls out mobile-first indexing in early-mid 2018, and SEO blogs and forums hit the panic button.
Wikipedia will ignore the chatter. They’ve been ready for mobile-first.
Look at the mobile version of the Office page, and notice the similarity with its desktop content. The main article content is the only content that appears on the mobile site. The only sidebar links that appear on mobile point to the equivalent article in other languages, and again, are pushed to the bottom of the HTML. All the other sidebar bloat-causing links are removed altogether.
Needless to say, when Google rolls out algorithms based on mobile-first indexation, Wikipedia’s ready to keep on ranking. Worth noting, it looks like Wikipedia has also shied away from hopping on the AMP bandwagon.
The mobile page for the Office in the screenshot above scores an 89 on mobile and 95 on desktop experiences. Wikipedia uses the m. mobile approach, as opposed to a responsive or adaptive approach to show mobile content, and it doesn’t redirect desktop users when they request the mobile page.
When requesting a desktop page from mobile, the desktop page does redirect to the m. site, accordingly.
For good measure, let’s score the same URL on Pingdom, and GTMetrix.
All 3 tools find that Wikipedia can improve page load by leveraging browser caching, optimizing images, and combining JS and CSS files so that content above-the-fold loads quickly.
Here’s the thing about page speed testing regarding SEO: very few sites get it perfect because most web teams don’t try to nail every tiny little recommendation. Page speed tools want you to optimize every single thing that loads visibly or loads invisibly in the background of a page.
Even then, there’s no guarantee of excellence because standard HTTP 1.0/1.x can only request a few files at a time. The solution for this lies in HTTP2, which allows for multiplexing where a browser or a spider can request multiple files at a time in parallel.
Wikipedia has adopted HTTP2 to improve page speeds for users today, and while Google today still hasn’t enabled HTTP2 for Googlebot crawls, they still recommend you implement it.
The Most Important Technical SEO Success Factor
The Platform onto which a small, medium, or enterprise website is built upon, ultimately determines your SEO ability and scalability. The platform (AKA the framework) is comprised of everything — the servers, the software/code (Mediawiki software and PHP), the databases, the content, the design — that powers a (family of) website(s).
If you understand these building blocks, and how they specifically create your overall site experience, you realize the true limits and true possibilities of technical SEO. You can experiment and infuse different SEO techniques as your platform allows to design a harmonious search engine and user experience.
The Wikimedia Foundation wins enterprise SEO with their platform, where most enterprise organizations struggle due to archaic infrastructure, internal politics, and inefficiency.
If Wikipedia executed like most large enterprises, it’s SEO technology wouldn’t be as powerful as it is today, and without SEO domination, who knows if Wikipedia would exist as the household brand it is today?
The platform also enables Wikipedia’s dozen sister Project sites, such as Wikidata, Wikinews, and Wiktionary so they can piggyback and position themselves to dominate web search too.
Wikipedia is far from perfect, as an SEO platform, and as the world’s most accurate encyclopedia. Like Youtube, Reddit, and Twitter, it too has systemic biases that challenge it from truly becoming the deepest, truest, and richest source of knowledge.
Hopefully, Wikipedia’s founders keep working on it.
Whether you agree or disagree with my assessments of their technical SEO foundation, I believe SEO professionals of all levels can greatly benefit from observing how large websites like Wikipedia structure their code and content at a page level and at a site level.
What Wikipedia tactics have you tested, and what results have you seen? What tactics are you hoping to apply? Which of Wikimedia’s sister projects do you predict to be most successful?