The exact way search engines work today is almost certainly impossible to explain in detail. I’m sure that even the fine folks at Google, even the highest-ranking engineers that oversee the most critical parts of the development of the search engine algorithms, will have a hard time explaining exactly how everything works.
Luckily, if all you want to do is earn some great money from blogging, you don’t need to know every minute detail about how search engines work.
You only need to read this guide.
What problem are search engines solving?
To understand how search engines work, we need to understand their purpose first.
We could easily make an argument that search engines are merely services provided by businesses, whose end goal is making money. And that is not wrong, technically.
We know that searching itself is free because search engines make money from displaying ads alongside some search results, for some queries. That’s the only way search engines earn money.
Now, not all search queries have ads, and that’s because advertisers only want to show ads to people doing specific searches, ones that advertisers believe are in line with what their potential customers might be searching. This means that for a lot of search queries, search engines don’t gain anything in return (or not directly, at least, but more on that later).
For search engines to be able to display a lot of ads and earn a lot of money, they need to be so good at providing relevant search results to all search queries (even ones that don’t have any ads), that people simply trust them whenever they search anything, whether the results might include ads or not.
So, we can say that one of the most critical problems that search engines are trying to solve is relevance.
Search engines aim to provide relevant results to the searchers.
Otherwise, what’s the point? Searchers will stop using the search engines entirely if they don’t find results for their queries that are at least halfway relevant to their search.
For every search query (also called “keyword”), there may be several web pages that are relevant to it.
And the Internet is vast. There are billions of websites out there, with possibly trillions of web pages.
The primary job of a modern search engine is to decide which one of those trillions of pages would be the most relevant to the search query, and then which one would be the second most relevant, etc.
Sounds kinda scary, when you think about it. How would that even work?
How do search engines work?
When presenting this example, I will assume I’m talking to a person that knows nothing about, well, anything. I will try to explain how a modern search engine like Google might work (although, as I said, explaining that fully is probably both impossible and not very useful).
Probably the first thing that a search engine algorithm would do is collect as many web pages as possible. This is also known as scraping. While collecting trillions of web pages may sound daunting, this part may be one of the easiest jobs of the search engine, as merely collecting the web pages is just a matter of computing resources, which search engines have plenty of.
Exactly how search engines scrape the Internet is not really relevant for this discussion, but know that by simply starting off with a number of initial web pages as seeds, and then gathering all of the hyperlinks on that page, and then scraping each one of those, and then gathering all of their hyperlinks in an even bigger pile, and just rinsing and repeating and making sure you are not scraping the same page again in too short of a time frame, eventually, you will exhaust the entire Internet, and your search engine will have all of the written text content gathered.
This is part of what the search engine does behind the curtains.
Another part that also happens on the backend of a search engine is sorting all of that content and preparing it to be ready for quick serving to the searchers.
Now, the following part is purely hypothetical, as personally, I don’t know how an exact search engine works, and the people working there usually sign NDAs and contracts that prevent them from revealing exact details on this. However, I am a Computer Science undergraduate, and I’ve worked as a programmer for many years, and I’m pretty confident that the internal parts of modern search engines simply must work kind of what I’m like to describe next.
So, after the search engine gathers all of the content of the Internet, it then proceeds to prepare for the search queries. The search engine must figure out which web pages to present to a person typing a certain search query, and it must do so a little bit beforehand, so that the search process itself can be fast.
This is also called indexing. The simplest explanation of it would be that for every search query that the search engine is aware of (like, for example, a searcher has typed it in previously), the search engine looks at a smaller number of web pages that either include the search query or some of its synonyms at least somewhere in the text, and then tries to rank-order this smaller subset of web pages based on a combination of factors, most important of which will be relevance and the trustworthiness of the publisher (more on these two in the next section).
The final side of how a search engine works involves the searcher, the person typing the search query into a search engine like Google.
Whenever a searcher searches for a keyword, the search engine goes back to its index for that keyword, and simply gives the list of web pages back to the searcher, already ordered by relevance and trustworthiness.
Also, search engines might look at other additional factors when ranking web pages, such as the searcher’s previous search history, to determine the context of the search query, and they also may use other data they have gathered around the searcher, like their language and location.
And, also, search engines look at some other factors like the speed of the website, whether it has an SSL certificate, and other factors, to further determine the quality of the web page.
But still, the primary factors we need to consider for now are relevance and trustworthiness. These are by far the most difficult ones to get right, and, naturally, the ones that are responsible for the success of a blog.
Now, the billion-dollar question becomes how do search engines decide on ranking web pages for specific keywords?
How do search engines decide on ranking content?
The two most important factors that search engines use to decide how to rank a web page for a search query are relevance and trustworthiness.
Let’s look at relevance first.
How do search engines determine the relevance of content for a search query?
It’s actually quite intriguing when you think about it.
If I were to tell you to look at two blog posts about how to make scrambled eggs, and decide which one is better, you will have to read them, understand them, and decide based on the completeness and correctness of the information, the clarity of expression, etc.
The search engine is not much more than some code running on some servers and crunching some data. In order to rank web pages for a search query based on relevance, that piece of code, or algorithm if you will, needs to somehow “understand” the text and the data it is processing so that it can perform the task that you just tried to do.
How do you go about making an algorithm, a piece of code processing some data, “understand” human language? How do you make it know which information is “correct”?
Well, as it turns out, you can’t. Search engines are still far from perfect.
But they do get close, close enough so that they are still very useful.
Again, the following is purely hypothetical, but I don’t think it’s too unreasonable – search engines also try to evaluate content based on completeness, correctness, clarity, depth, etc.
They probably start evaluating relevance by looking at the title of the web page. Of course, they’re not looking at just the title, but the title itself probably carries more weight in determining what the web page is all about. They also may look at the first few paragraphs and the subheaders and see if the search query is somewhere in there as well.
Maybe they evaluate the completeness by then looking at all of the web pages that might be relevant for that search query, and then seeing which ones are longer in content, have more details, more words, etc.
Maybe they evaluate correctness by trying to somehow extract claims from the web pages, and then determining what is to be considered correct based on the majority of the claims.
To be fair, search engines probably use natural language processing techniques and other advanced forms of data science that may not work exactly like this, but I still believe the end results probably end up following a similar pattern to what I described.
What we do know from practical experience is that original, long-form, in-depth content, that is also factually correct and is presented clearly, tends to perform really well, and it will outperform shorter, untrue content presented in a messy way, all other things being equal.
That’s probably the most important lesson in blogging: write original, detailed, correct content, present it clearly, and it will be considered relevant by Google and other search engines.
Speaking about all other things, the other important factor we mentioned is trustworthiness.
How do search engines determine the trustworthiness of content for a search query?
As I said, neither I nor any digital marketing expert can be 100% sure about how search engines determine the relevance of content. However, when it comes to determining trustworthiness, we are all pretty sure the most important factor is links.
Simply put, if a trustworthy website links out to another website, the other website becomes a bit more trustworthy as well.
How do search engines determine which websites are trustworthy, to begin with?
Well, there are two possible approaches, and the real way this is done may be some combination of the two.
I believe it wouldn’t be too far-fetched of an idea for the teams behind the search engines to simply hand-pick some websites they know are trustworthy, and go from there. There might be a bit of that going on, although I’m quite sure that today, all of the established and trustworthy websites are already known, and if this ever happened, it was probably only done in the past. There would simply be no need for this today.
Probably the actual way that search engines determine trustworthiness has one form or the other of simply looking at the huge pile of billions of websites, and just counting whenever one website links to another as a vote of trust. Then, maybe they even do a second round of that process, but now they have a better idea of which websites are more trustworthy, so maybe the links from them count as more than one vote. Something like that.
What we as bloggers know for sure is that whenever a website manages to get a backlink from another website that is trustworthy, it tends to rank better. This doesn’t include any blackhat or spammy techniques, as those can be quite risky and it’s not something I recommend ever doing. Simply doing your best to earn links from trusted websites will be a very smart long-term strategy for your blog.
We are only scratching the surface here, but even now this is more than enough for what you need to know about blogging. What’s much more important is what we can do to improve our rankings, knowing all of this.