How Search Engines Work-Beginners Guide

searchA basic understanding of how search engines work and what they are looking for is necessary to successfully optimize your own website.

In this article I will try to explain in simple terms.

  • How search engines find your website
  • What a search engine spider does
  • Why search engines return different results i.e. why are Bing results different From Google results.

Before we talk about search engines we need to understand how they evolved, and what problems they were designed to solve.

Although Google has become a large and popular company you need to realise that it is also a very young company, and was only founded in 1998.

Before that finding something on the Internet meant going from one site to another following links to other pages and sites that the website owner had provided.

The first attempts to make the web easier to navigate involved the creation of lists of sites organised by topic/theme, the most famous being the Yahoo directory; which apparently started out being called “Jerry’s Guide to the World Wide Web” after one of the founders Jerry Yang.

This list /directory approach couldn’t really keep pace with the growth of the web, and so another approach evolved that involved using computers instead of people to create these lists automatically.

This they did by following the links on a site to discover new sites, and to categorise the sites it found based on the content of the site.

The computers that did this became known as search engines, and the process of following links from one site to another became known as spidering.

These search engines built huge directories automatically and provided an easy way to search these directories.

However they aren’t called directories as they are created by a search engine, instead they are known as search engine indexes.

Although search engines have evolved, and become more complex the fundamental principals are the same. That is:

They follow links from one site to another and attempt to categorise the sites they find by analysing the content they find on the site.

The program that finds and follows links is the search engine spider.

All the major search engines work in the same way. The reason they return different results is due to the algorithm they use for ranking the web pages that they find.

Search Engine Algorithms

search-algorithmsA search engine Algorithm is a set of rules that the search engine uses to assess web pages against a specific search query.

To illustrate this we will create our own simple search engine.

As an example lets imagine that our search engine has three web pages indexed.

Because of our technology we are only able to read, and store the page file names, and titles. These are shown below:

 

Page Page Name Page Title
Page 1 Recipes.htm 200 easy to cook Recipes
Page 2 Winelist.htm Chez-Phillipe wine list
Page 3 Favourite-Recipes.htm My 10 favourite Italian recipes

 

Now imagine a searcher types the following query

best french red wine of 2014

We need to present the searcher with the best result, and so we need to compare the search query with the pages that we have indexed, and apply some rules to determine which pages fit the query, and from those pages which is the best fit.

So our rules are:

  1. Keyword must match in title or page
  2. Page with the most keyword matches wins

So applying this to our pages, and search then page 2 will win as the keyword wine appears in both the file name and title.

Over time as the number of pages in our index increases, and the search results became less relevant we could modify our rules or add new rules.

Although the above example may seem trivial, and it is in comparison to the current search engine algorithms employed by Google and Bing, earlier search engines were not so dissimilar to our example.

Here is a quote from Wikipedia on Archie

The very first tool used for searching on the Internet was Archie.[3] The name stands for “archive” without the “v”. It was created in 1990 by Alan Emtage, Bill Heelan and J. Peter Deutsch, computer science students at McGill University in Montreal. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol) sites, creating a searchable database of file names; however, Archie did not index the contents of these sites since the amount of data was so limited it could be readily searched manually.

This is what happens today in the world of search.

Both Google and Bing are constantly modifying their rules (algorithms) in an effort to keep the search results relevant.

Sometimes they add new rules, remove rules altogether, or modify existing ones.

This is why a page can rank well in search, and then quite suddenly it doesn’t rank so well.

The rules they use (algorithms) are proprietary to the search engine and SEO experts can only guess at them.

Here is a video by Matt Cutts (Google engineer) that explains the index and how sites/pages are ranked.

Summary

Search Engines discover web pages and sites by following links from sites that they already have indexed.

Search engine rank web pages using a set of rules (Algorithm).

These rules are constantly being changed, and are proprietary to the search engines themselves.

References and resources

Save

Save