November 11, 2006
The story of Google is in many ways the archetypal engineer's dream. They invented a better way search the web, set up in a garage-like space and rose to the top. But engineer's also value results that can be reproduced, and part of what makes Google so scary is that it can not be reproduced. As hard as Yahoo and Microsoft are trying, with obscene amounts of financial, engineering and computing resources at their disposal they can't generate search results as good as Google's. The search world is already oligarchical, but as google rapidly turns into a verb, it is well on it's way to become a monopolized space.
Page Rank you see is an irreversible and an irreproducible process. Page Rank is the name for the key aspect of Google's search algorithm, the engineering breakthrough that make Google so much better than all those now dead or battered search engines of the 1990's. And it's also the thing that makes it so damn hard, if not impossible to make a search engine as good as Google's. You can reverse engineer Page Rank of course and you can be damn sure both Yahoo and Microsoft have invested plenty of time to that effort. The problem though is that Page Rank just would not work if you ran it today, and that's why Yahoo and Microsoft just can't provide the same quality of results as Google.
At it's core it's a problem of the data set. Page Rank's big break through was that it realized that links between webpages could be used as a way to judge the quality of a piece of content. If a page was linked to by multiple sites odds are it was a better page than one with no incoming links. Furthermore if the links came from other high quality pages the odds would be even higher. I wrote that all in the past tense though, because Page Rank is a victim of it's own success. The internet is now filled with massive amounts of pages generated with the explicit goal of hacking Google, of pushing sites up higher in it's search results. The internet as a dataset is now dirty, if not filthy.
This is a problem for Google of course, but it's not nearly the same problem it is for them as it is for it's competitors. Google needs to deal with the many sites trying to hack it's results, but it has a major tool to fight them, the data generated by Page Rank before search engine optimization became a profitable and fulfilling career. It means Google weighs slightly towards older sites, ones established in the era of clean Page Rank, but it also means that anyone trying to reproduce Page Rank by spidering the internet today, just can not get results nearly as good as Google's. So until someone devises a brand new algorithm, it's going to be Google's internet and the rest of us are just searching for our own small little piece of it...Posted by Abe at November 11, 2006 11:59 AM