how does google do it?

Shalmanese · Jan 28, 2001

how the hell does google get away with those ridculous search times?

I figure that the average webpage is 20k so they must have 1.32 billion * 20k which gives us 26.4 terabytes of data that needs to be stored on a server.

my hard drive has 4 Gb of data or approxamaty 65000 times less.

using the windows find command and searching for the word "cat" took 12 mins or 7200 seconds

using google it took 0.09 seconds which means that it was 80000 times as long.

using this test, google would be 65000*80000 times as fast or 5.2 billion times as fast.

specs for my machine are: celeron 413MHz, 64MB ram, 6GB Quantum 5400rpm HD (searched on second partition)
I know my computer is a POS but google's hardware and software need to be 5.2 billion times as effecient to work that fast. what I really want to know is does anyone have the hardware specs for google so I can drool all over them?

Imported · Jan 28, 2001

I dunno.. but I wish I had the same hardware too.

JoeKing · Jan 28, 2001

all it takes is some smart algorithms or some stuff

cpars · Jan 28, 2001

Man that was quick! never used before but just tried.

IamDavid · Jan 28, 2001

Very complex algorithms..

kami · Jan 28, 2001

I thought they only stored meta tags and the first sentence or two from each site? Or is Google different?

Shalmanese · Jan 28, 2001

AFAIK, google searches the entire site from it's cache and then assembles theresults based on how many sites link to that site and best relevance.

Although windows performance is horrible, I dont think that algorithms could push it up more than a magnitude or two which leaves the rest to hardware

kami · Jan 28, 2001

hardware that would make any geek here moan in ecstasy :Q:Q haha..

kw3i · Jan 28, 2001

i think they are running 5 different servers for their search engine, all pentium2 350 with 194 megs of ram each. not sure about the hard drives. check them out here

kw3i · Jan 28, 2001

disclaimer my last post was a joke

Shalmanese · Jan 28, 2001

some ballpark calculations:

assunming alorgiths could increase speed by 1000x
and current SCSI HD's are 10x as fast as my POS drive

then they would need 520,000 HD's in a RAID 0 array to get that kin of performance

add raid 1 for redundancy and that's 1.1 million drives which would mean 2 GB per drive.

or maybe they just have all of it in 28TB of QDR RAM?

kami · Jan 28, 2001

Or maybe they have solid state drives

Shalmanese · Jan 28, 2001

what I really want to know was who was the lucky bastard who invested in HD/RM/solid state drive stocks just before google started and made a packet

or maybe it's just one big conspiracy theory (like the close door button on lifts is a placebo)

Shalmanese · Jan 28, 2001

update on the situation, just did a search for "what kind of hardware is google runing" or where else but google.

this claims that google should be at 10,000 machines by now all running linux so maybe that has something to do with how fast it is running

Don Vito Corleone · Jan 28, 2001

Google has, from my perspective, rendered the other search engines obsolete. Not only is it lightning-quick, but it is by far the most intelligent in terms of its returning relevant results (with the exception of the recent "Dumb MFer" problem), and it does not clutter my screen with advertising. The new Google toolbar is great as well.

Double Trouble · Jan 28, 2001

Hmmmm..... maybe it's just me then. I'm not at all impressed with Google's search capabilities. It rarely gets me to the right pages when I do a search there. In fact, I've stopped using it ..... I wonder if it's just the subject matter that I'm searching or something like that. Dunno.

perry · Jan 28, 2001

Google is great, even Yahoo thinks so. They dumped the Inktomi database a few months back in favor of Google for search results not in their directory.

thEnEuRoMancER · Jan 28, 2001

Shalmanese, the technology of Google's search engine is completely different with Windows find feature so you shouldn't really compare them in this way. Read this article The Anatomy of a Search Engine if you want to understand how Google works.

AvesPKS · Jan 28, 2001

Geeze, that is pretty rediculous...

jorken · Jan 28, 2001

Doesn't yahoo use google as its search engine?

iamwiz82 · Jan 28, 2001

Google is amazing, plus the have a sense of humor. Just look for the thread about george w bush, lol.

thEnEuRoMancER · Jan 28, 2001

I'm no fan of Bush at all but I don't find this prank funny. Besides it's bad for their business reputation.

edit: Google canceled the joke, it doesn't work any more

SinnerWolf · Jan 28, 2001

You're comparing 2 totally unrelated procedures. In order to find a charcter string on your computer, windows looks at every file name on your computer and its extension. And then compares that string to its binary equivalent to find a match. This is a hardware search. When you use a search engine, it's purely software based. Websites typcially use meta tags, which is an area of a website's code that has a list of hot words. If any of these words match your search criteria, the search engine lists it. When a site updates it hotlist, it resubmits its info to the search engine. All that yahoo, lycos, etc..have to do is do a smart look through a series of categorized words to determine which matches your search string. Google and metacrawler use *most* of the independant search engines, and output their combined hits on a single sites.

thEnEuRoMancER · Jan 29, 2001

Hehe... The response to "Dumb MFer problem":

(Note: If you have arrived at this site through inappropriate references via a search engine, please be assured that we did not utilize this language in our site, our HTML, nor in our internet promotion of this site. What happened was the result of a malicious act by a third party and we have pursued remedies through the efforts of our staff and attorneys.)

arcain · Jan 29, 2001

The two search types are very different.

The find command is very much a brute force algorithm. Your computer's file system stores no information about the contents of a file, so therefore when you searched for 'cat' it had to look in each and every file.

On the other hand, Google is (probably) very heavily optimized. They know what type of data they're storing (html/text). They know what type of searches are being performed (keyword or exact substring matching vs ranged types). Knowing this a database can be heavily optimized for the data. The database will be designed with this sort of index in mind.

The index in a database is similar to an index in the back of a book. Instead of flipping through a book looking for a word on every page (like Window's find command), you can look in the index, look up the word, and find the pages the word is on (like a database index).

And by having a great deal of RAM, it may be possible to have much of the index in memory. Also their data is probably also heavily distributed across many machines so that way the lookup can be done in parallel. And even though that info page says they don't cache search results, it seems like it was written when they were still doing research on it (as their goal was only to be able to handle a few hundred searches a minute (or a second, either way to slow for their current application), I'm betting they do cache their search results. Many search terms are probably fairly popular, and instead of constantly redoing the search, they can store the results, and when another person searches for the same string, the can just grab the results from memory.

And.. Google is very different from Metacrawler. Metacrawler send your search string to the various search engines (at least the last time I used it) and reorganizes the results. Google _is_ an "independent" search engine like Altavista or Lycos. They "crawl" the web themselves following the links on pages using programs and add the webpages to their database automatically. Google neither uses other engines to search nor do they require users to submit their pages.

how does google do it?

Platinum Member

Lifer

Lifer

Golden Member

Diamond Member

Lifer

Platinum Member

Lifer

Golden Member

Golden Member

Platinum Member

Lifer

Platinum Member

Platinum Member

Elite

Elite Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Lifer

Golden Member

Senior member

Golden Member

Senior member