November 11, 2006

Irreversibly Google

The story of Google is in many ways the archetypal engineer's dream. They invented a better way search the web, set up in a garage-like space and rose to the top. But engineer's also value results that can be reproduced, and part of what makes Google so scary is that it can not be reproduced. As hard as Yahoo and Microsoft are trying, with obscene amounts of financial, engineering and computing resources at their disposal they can't generate search results as good as Google's. The search world is already oligarchical, but as google rapidly turns into a verb, it is well on it's way to become a monopolized space.

Page Rank you see is an irreversible and an irreproducible process. Page Rank is the name for the key aspect of Google's search algorithm, the engineering breakthrough that make Google so much better than all those now dead or battered search engines of the 1990's. And it's also the thing that makes it so damn hard, if not impossible to make a search engine as good as Google's. You can reverse engineer Page Rank of course and you can be damn sure both Yahoo and Microsoft have invested plenty of time to that effort. The problem though is that Page Rank just would not work if you ran it today, and that's why Yahoo and Microsoft just can't provide the same quality of results as Google.

At it's core it's a problem of the data set. Page Rank's big break through was that it realized that links between webpages could be used as a way to judge the quality of a piece of content. If a page was linked to by multiple sites odds are it was a better page than one with no incoming links. Furthermore if the links came from other high quality pages the odds would be even higher. I wrote that all in the past tense though, because Page Rank is a victim of it's own success. The internet is now filled with massive amounts of pages generated with the explicit goal of hacking Google, of pushing sites up higher in it's search results. The internet as a dataset is now dirty, if not filthy.

This is a problem for Google of course, but it's not nearly the same problem it is for them as it is for it's competitors. Google needs to deal with the many sites trying to hack it's results, but it has a major tool to fight them, the data generated by Page Rank before search engine optimization became a profitable and fulfilling career. It means Google weighs slightly towards older sites, ones established in the era of clean Page Rank, but it also means that anyone trying to reproduce Page Rank by spidering the internet today, just can not get results nearly as good as Google's. So until someone devises a brand new algorithm, it's going to be Google's internet and the rest of us are just searching for our own small little piece of it...

Posted by Abe at November 11, 2006 11:59 AM

Comments

This is completely wrong. Google's advantage is at *most* 1% engineering/secret sauce ... the other 99% is brand (with perhaps little bits attributable to speed of page rendering and the design of results page).

Yahoo's are not noticeably better or worse and MSN's are only a tiny bit worse now. I switched to Yahoo! search exclusively shortly after starting to work at Yahoo!. For a few months, I ended up doing the same search on Google immediately after a Yahoo! search (just to make sure I wasn't missing something) but eventually found that I didn't feel that need anymore.

Very occasionally Google produces better results for a given query, but Yahoo! spam filtering is marginally better so the quality is more or less even. That the overall quality is even is the received wisdom in the industry, backed up by blind tests, and freely admitted by Googlers (even Peter Norvig, who headed search quality for Google until recently).

But the really interesting thing is how little quality matters to anyone. It is *all* about brand -- just like Crest/Colgate or Coke/Pepsi, with the exception that even people who are savvy enough to understand marketing's general influence on their behavior and preferences can't see it in the case of Google.

When you write that Yahoo! and MS "just can not get results nearly as good as Google's", you are looking through brand-colored glasses. What Google's competitors needs is not a "brand new algorithm" but a way to shift people's preference away from what is perhaps the most successful brand in history.

Abe,

Stewart is right. Google had a great story about why their technology was better: page-rank. That story convinced tech-heads and sounded good to novices. But the real value was in making the Internet simple.

They distilled the crazy, complex, untamed jungle of the World Wide Web into a single query box that seemed to work like magic. A simple home page with simple, effective results.

Yahoo! in contrast turned their home page into a cacophony of services, a visual tumult that shouted at users about the complexity and wonder of the Internet. In today's information-overload world, Google's brand, implicitly built around the promise of simplicity, was more powerful, more credible, and more effective than anything else in the market.

Fortunately for Google's challengers, Google has never formalized this brand focus and is systematically destroying it with its nearly unchecked forays into wildly diverse and unsimple services like Google Finance, Google SideBar, and YouTube. Even though they now have a freeze on new product releases--someone realized there was a problem with market overload--they lack a unifying vision of where to go from here, leaving their actions hopelessly out of sync with their core brand. Google today is anything but simple.

Abe might be on to something though, if Wikipedia is to be believed:

Ask Jeeves had dropped below Google, MSN, and Yahoo! in the size of their userbase. However, because Ask.com was slow to index some new webpages, Ask.com did not suffer the onslaught of computer-generated linkspam results that initally flooded Google Search, MSN Search, and Yahoo! Search and buried significant webpages that Ask Jeeves (or Ask.com) could still find.

I'm not inclined to think that this same set of circumstances exists today, but I think the compelling aspect of search that Abe touches on is the notion of history.

I'd be curious to know if Google tracks statistics such as how frequently links are added to particular sites or the rate at which new sites appear and uses them in its calculations when returning results.

Then again, I just watched Seth Godin's presentation to Google where he proposes that marketing is more important than the technology. Technology is of course important, but it's possible to have the best technology and fail as a business.

Dave,

Thanks to the link to Seth's presentation. Nice to have the reference for the inability of people to distinguish between Google and Yahoo! results when formatted identically.

Seth does, however, miss the opportunity to highlight the need for Google to focus however.

You can literally see one of the continuing reasons for Google's success. Compare the front end of Google and other search engines:
http://www.google.com/
http://www.yahoo.com/
http://www.msn.com/
http://about.com/
etc.
Where would you rather search?
I have similar feelings about many of their "beta" products, hence my email address.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)