One of the hardest parts in the life of a search engine is managing the extraordinary diversity of content it encounters while doing its job (both in the pages it sees and the questions that it gets asked). Instead of trying to do all of the reasoning and analysis when a search term arrives, it tries instead to organize the information in such a way that it learns a little bit about it, and where to find it, before its even asked a question. If search engines were actually going out to every page to look at its content for the first time when you performed a search, it would take forever to get your results. Imagine going to a big concert and being asked to find the oldest person from Indiana at the show. This would be incredibly difficult if you didn't have any information about the people -- but if you were allowed to arrange the people by the origin state as well as their age within each state before the question was asked, then it would become much easier to arrive at a quick and accurate solution. Phone books are alphabetical for a reason!

The Anatomy of a Search Engine

Search engines, including Google, are generally composed of three major parts: the crawler, the indexer, and the searcher. This video was produced by a team at Google describing how search engines work; listen to their description of the process (they are pretty good at it, after all). Don't worry if you don't follow all of it -- we'll spend more time on it in a second.

Sounds easy enough, right? Today we're going to use BYOB to create a very simple search engine. Search is a reasonably challenging thing to do at all, and an incredibly difficult thing to master (Google, Microsoft, and some others are spending billions trying!). Breaking up the search engine into multiple parts will make the overall problem much more manageable. Understanding the concepts behind search can be challenging, so we'll spend a little bit more time on those in the following section.