Shovelling coal into the server ⏳

August 20, 2020

Probabilistic Thinking in SEO

SEO is not black and white.

Yet we’ve all been guilty of proposing solutions as if there’s no uncertainty. We need probabilistic thinking in SEO; an acceptance that our landscape is unpredictable, and although we might know the chances of a successful outcome, we cannot guarantee it.

Most decisions in SEO involve hidden information and outside influence. The most obvious of which is Google’s algorithm, which updates regularly and is, for the most part, a black box.

Sure you know from Google that page speed, HTTPS, and mobile-friendliness are all positive ranking signals but what about the variety of other signals you just don’t know about? Add on top of that the complexity of a whole series of search algorithms that apply various weightings to each signal depending on the nature of a search query and your certainty should go out of the window.

The SEO Pitch

You don’t hear much uncertainty in an SEO pitch.

“In your brief you said that you want to increase organic traffic by +40% in 1 year and we will achieve that.”

It’s this unsubstantiated confidence that wins pitches and loses the client a year later when you underdeliver. Weather presenters make this mistake too. How many times have you followed the certain dry forecast of a weather presenter only to go outside without an umbrella and get wet?

Certainty assures us. You can perceive someone as more credible just because of it. Combine this with an expert source, and it can be very persuasive. They present the weather, they must know what they’re talking about, right?

While persuasive, absolute certainty in an SEO pitch leads to poor decisions. Even when outcomes are favourable ie. you win the pitch and achieve your target, the cumulative effect of poor decision making will hurt you in the long-term.

It’s okay to be unsure. You should acknowledge uncertainty. Using our weather example, what if instead of tuning in to your friendly but very certain weather presenter, you checked out the forecast online. And what if instead of certain sunshine, the probability of it not raining was 60%. It’s more likely to be dry but this time you’d take your umbrella.

When you think probabilistically, you think of the consequences. Even though it probably won’t rain, the risk of getting wet exceeds the convenience of not having to carry an umbrella.

Apply the same logic to SEO. Recognising uncertainty sets clearer expectations. If you’re more cautious in pitches, set reasonable confidence intervals on projections and highlight the limitations, the client can make better decisions.

How to think probabilistically

Humans are deterministic by nature. It’s instinctive for us to believe something is good or bad and true or false. Someone is either your friend or not, the food is either good or bad and you either like something or you don’t. We don’t tend to say that there is a 70% chance that someone is your friend – that would be weird.

In SEO you might say particular activities are ‘best practice’ or that they’ve been successful before, so they’ll work again. In a perfectly predictable world this would be okay. But it’s not. The influence of luck and hidden information mean you can never be completely sure of the outcome.

You need to think like a professional gambler and treat your decisions in SEO as bets. What is your best guess that your recommendation will result in a successful outcome?

Closing the information gap

Like a professional gambler, the accuracy of your guesses will be determined by the information at your disposal and your experience.

Thomas Bayes, an English statistician, discovered that you can improve your knowledge of an uncertain outcome by using new information as it becomes available.

He used a billiard table as a thought experiment. Imagine you were blindfolded and someone rolled a ball and you were asked to point out where it stopped. It would just be a random guess, right?

Now imagine that someone dropped more balls onto the table and let you know their location relative to the first ball, left or right. As more and more balls are rolled, you can probably narrow down the area where the first ball stopped.

As an SEO you can’t see inside of Google’s black box but you can use other information to get closer to objective truth.

Reasonably accurate estimates

Do you know the likely impact a given SEO tactic will have on organic traffic?

This is a reasonable question for a client to ask but on the face of it, it seems difficult to answer. Our endeavour for certainty makes it so. Yet there are principles from a well known Italian American physicist that provide a rational way of approaching this problem.

Enrico Fermi, a Nobel Prize winning physicist and ‘architect of the atomic bomb’, was known to make surprisingly accurate estimations based on little or no data. He created brainteasers, later named Fermi problems, to teach his students the art of approximating.

One problem he asked his students to answer (pre-internet) was: ‘How many piano tuners are there in Chicago’?

Much like our SEO problem, this seems impossible to answer. I don’t know, 200? You’d probably at least take a stab at it.

Random guesses don’t create reasonably accurate estimates though. Fermi knew that to provide an approximate answer, he had to break down the question by asking, ‘what information would allow me to answer the question?’ This would then help him to separate the knowable and the unknowable.

Philip Tetlock and Dan Gardner, knew in their book, Super-Forecasting: The Art & Science of Prediction, that they could nail the question by knowing the following four facts:

1. The number of pianos in Chicago

They had no idea but calculated it based on:

a) an approximate of how many people live in Chicago. The mid-point of a range they were 90% certain of: 2.5 million people

b) the percentage of people that own a piano: black box guess of one in one hundred

c) the percentage of institutions that own pianos? black box guess of let’s double the per person number

Total pianos in Chicago = 50,000

2. How often pianos are tuned each year

Once per year sounds reasonable

3. How long it takes to tune a piano

Two hours sounds reasonable

4. How many hours a year the average piano tuner works

They broke this subquestion into two:

a) standard US workweek is 40 hours minus two weeks of holiday. 40 hours x 50 weeks: 2,000 hours per year

b) piano tuners will likely need to travel between jobs. Let’s estimate that 20% of their time is spent travelling: 1,600 hours per year

Final calculation

50,000 pianos x 2 hours tuning per year = 100,000 piano-tuning hours per year.

divided by

the annual number of hours worked by one piano tuner = 1,600 hours

= 62.5 piano tuners

At the time they actually found 83 piano tuners in the Chicago yellow pages, so their approximate was surprisingly accurate.

While this also sounds like very back-of-the-envelope calculations, breaking down a question into component parts should become a part of your natural thought process if you want to create reasonably accurate estimates.

Fermi-izing an SEO problem

Firstly lets make our SEO problem a little more specific to help break it down.

What’s the likely impact on organic traffic of adding FAQ schema to our landing pages?

You’ve identified an opportunity to add structured data to the FAQs on one of your clients’ key landing pages to improve CTR but want to better understand what success looks like, and the likelihood of it.

What information would help answer this question?

In the same way issues trees were used to break SEO problems into constituent parts in my article on developing SEO strategies, you can treat the question like a mathematical equation.

Current Impressions * Estimated CTR increase = Likely Impact

1. # of keywords that the landing pages rank on the first page for.

This is knowable. You can get the volume of top 10 keywords for key landing pages from Google Search Console.

Let’s say your landing pages rank on the first page for 300 keywords and you estimate that FAQ rich results will feature across 80% of them.

2. Impressions of keywords that the landing pages rank on the first page for.

This is knowable. You can get the impressions data from the same keywords identified in the point above averaged across a reasonable time period.

Let’s say monthly impressions from the last 2 month across the keywords have averaged 120,000.

3. Current CTR of keywords that the landing pages rank on the first page for.

This is knowable.

Let’s say the current CTR across the keywords from the last 2 month has averaged 3.75%.

4. CTR impact of adding FAQ Schema.

This information is critical but more difficult to attain. You can get it from your own first party data or from third party data. Third party data may include industry studies or case studies and first party data could include your own studies, or the outcome of your own experiments.

You should use something called the outside view to get a base estimate. This looks at the question from a broader perspective. In this case, what is the CTR of rich results vs ordinary search results generally?

Outside view

A slide from SMX 2020 by Abby Hamilton found that rich results across e-commerce, health, entertainment, food and travel had a CTR between 8% and 83% greater than ordinary results.

For the purposes of this example, let’s say you also explored your own data in Google Search Console and saw that CTR for rich results on your website had been 5%60% more than ordinary results.

This is imperfect data but it’s a base to start from. You can be fairly confident that an increase in CTR will lie within a range of 5% to 83%. We’re using subjective probability; the idea that we can reasonably accept a value to lie within this range.

There are counters against this approach. As an example, some may argue that you can’t assign a probability against a human walking on mars within the next nth years, simply because it hasn’t been done before.

Given that SpaceX is preparing for a human mission to Mars within the next 10 years, I’d say it’s 80-90% certain that a human will walk on mars within the next 100 years. Wouldn’t you?

There is almost always relevant information that can be used to create probability estimates. Assigning probability in a decision making context has to be more useful than applying nothing at all.

Inside View

The inside view looks more at the specifics. Are there any studies on the impact of FAQ schema on CTR, or have you ran any experiments on the website in question, or any of your other websites?

From a quick Google search, there were two experiments. Luca Tagliaferro wrote on Search Engine Land that they saw a 51% uplift in CTR when implementing FAQ schema across 1000 keywords. In another series of experiments, Distilled saw uplifts in traffic to commercial pages ranging from 3% to 8%.

You could use this new information to update your confidence and narrow your intervals. Distilled’s experiment doesn’t tell you a great deal about the impact on CTR directly, and the case study on Search Engine Land is on one website with a small sample of keywords. Nevertheless, intuitively you may narrow your interval between 5% and 60% given that traffic uplifts were small from Distilled’s experiments and your experience says 83% is unlikely.

Point estimation using the Geometric Mean

You can convert your bounds into a point estimate. Taking ‘the average’ will likely be well off, but you can use something called the geometric mean instead. The geometric mean is better suited to data with outliers or extreme variance.

If you use an online calculator you’ll find the geometric mean of 5%-60% is 17% rounded down. This is your point estimate for the impact on CTR. A factor of around x3.5 from both upper and lower bounds.

Putting everything together

Now it’s time to calculate your estimate.

You have 120,000 impressions across 300 keywords. You estimate that FAQ rich snippets will only feature across 80% of keywords. 80% of 120,000 is 96,000.

The current average CTR across those keywords is 3.75%. 3.75% + 17% = 4.38%. The absolute impact being 4.38% – 3.75% = 0.63%

96,000 impressions multiplied by 0.63% = estimated impact of 604 clicks per month.

You can frame it like this:

“Our best estimate is that implementing FAQ schema will result in an additional 604 clicks per month but the impact is likely to be between 540 and 2,160 clicks per month.” (your upper and lower bounds).

Achieving more certainty

Eyes were rolling with the data above. That was you, wasn’t it? There are always limitations estimating, particularly when you can’t rely on representative data. But that’s the industry we’re in. You need to work from wide, uncertain intervals a lot of the time. But as long as you break problems down, your estimates will be closer than you think.

That doesn’t mean you shouldn’t strive for more certainty – you should. Narrowing your intervals or increasing your certainty of a successful outcome, will mean you’re more right than wrong over time – and that’s a very good thing.

SEO Testing

As Thomas Bayes figured out, you can see a little clearer when you use new information (even when blindfolded). You can narrow your confidence intervals if you’re more confident in the data.

The most effective way to do that in SEO is to test. There’s some fantastic SEO testing tools on the market like SearchPilot and RankScience and so many more.

If you test a variable across a sample of homogenous pages before rolling the change out across the population of pages, you’ll have more confidence in narrowing your range and provide a more accurate point estimate.

Peer Review

Your opinion is the only one that counts, right? Wrong!

A scientific theory is generally not accepted by the academic community unless it has been published in a peer-reviewed journal. Scrutiny from experts helps to maintain integrity and quality standards to support the advancement of science.

In SEO, you don’t need to publish your findings in an academic journal before presenting them to your client. But it does help to get a little help from your friends.

In fact, that’s exactly what you don’t want. We’re accustomed to getting opinions from people similar to ourselves. Our social media feeds are small bubbles full of thoughts that conform with our view of the world. This inspires confirmation bias – and you don’t want that. What you do want is feedback from a diverse group of people. Those that are going to provide opposing views and move your analysis forward. Seek them out and advance your thinking!

Decisions as Expected Value Calculations

Decisions in SEO are bets with wagers, rewards and odds of success. Wagers are yours or your clients’ resources and rewards are your estimated successful outcomes.

The ideal is to take on bets with likely outcomes and high rewards – that, of course, is the perfect decision. But the vast majority of decisions sit somewhere in between, or near the opposing extremes ie. unlikely outcome, high reward.

The decision to launch a startup is one of them. An empirical study by Harvard Business School study found that first-time entrepreneurs have only an 18% chance of succeeding. If all first-time entrepreneurs based their decision to launch a business solely on the probability of success, there’d be no startups.

But they don’t. Despite first-time entrepreneurs only succeeding 18 times out of 100, the rewards of success are so staggeringly large, that, so as long as the costs of being wrong are negligible, it can be worth a bet.

Expected value calculations are simple.

(Probability of succeeding x reward) – (probability of failing x penalty).

Eg. (60% x £6,000) – (40% x £200) = Expected Value of £3,520

In SEO, often the only penalty for making the wrong decision is lost time (which can be translated into money). That can be your own, or your clients’ investment into the implementation of your decision, or both. For example, implementing a content strategy that augments the existing content on a website will rarely have a negative impact on organic traffic but the investment in time may outweigh the financial rewards of making the decision.

There are edge cases where big bets are lost or extremely poor decisions are made. ie. large website migrations or disallowing your website in robots.txt. But good decision making relies on good information, competence, experience and a little bit of luck.

A good SEO strategy has incremental wins over time. You should compute expected value calculations to help prioritise your recommendations, and like a professional gambler, make sure your expected rewards exceed your expected losses over the long-term.

With a little pinch of salt

If you calculate complex mathematical equations for every decision, you’ll make no decisions at all. You do a lot of these calculations intuitively, or at least you can train yourself too.

There are some decisions that can be made easy. For example, taking an hour out to create and implement page titles across 80% of your key landing pages where they’re missing doesn’t need a lengthy decision process.

Expected value calculations should be useful, not detrimental to your time.

Final Say

The SEO landscape has growing complexity and an unpredictable nature. Most of the information you need is hidden in the depths of Google’s black box, entangled between layers of algorithms. Because of that, you cannot express absolute certainty in your decisions.

Instead you should adopt probabilistic decision making – thinking in bets. A cautious, investigative approach to your SEO decisions that weighs up the likelihood of success, rewards and the wager for making your decision.

Disclaimer: I’m just an SEO with a keen interest in probability and forecasting and most definitely not a statistician.

Please let me know your thoughts in the comments below. And if you want to hear more from me, you can subscribe to my newsletter at where I curate my favourite SEO articles, and a short piece of insight every week. 

Posted in SEO Strategy
  • Jordan Fowler

    Love the perspective. The only hat that matters is “work hat” based on probability and science. We share similar sentiments on using hypothesis theory in SEO here.

    The Work Hat SEO approach is akin to what you propose here, using 4 key questions to determine the approach based on data, patents, and case study testing.

    1. Does this SEO strategy work now to increase rankings and organic traffic (and why)?
    2. Based upon what we know about Google (and other search engines), will this SEO strategy work into the foreseeable future (and why)?
    3. Is there any risk associated with this SEO strategy now (or one we see as probable in the future)?
    4. Can we mitigate those SEO risks to the website to the degree that the probability of return warrants activation of the strategy?

    Our perspective is test, test, and test more. Good science requires repeatability. We always try to be able to articulate THE WHY behind the tactic. If you know WHY something works or doesn’t, it helps with making the adjustments as the algo changes.

    5:00 pm August 21, 2020 Reply
  • Manikandan N

    That’s an outstanding probability & forecast in SEO ✌️ I will experiment it and give my results in next the next couple of months

    4:48 am August 23, 2020 Reply
Write a comment