how does google do it?

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
how the hell does google get away with those ridculous search times?

I figure that the average webpage is 20k so they must have 1.32 billion * 20k which gives us 26.4 terabytes of data that needs to be stored on a server.

my hard drive has 4 Gb of data or approxamaty 65000 times less.

using the windows find command and searching for the word "cat" took 12 mins or 7200 seconds

using google it took 0.09 seconds which means that it was 80000 times as long.

using this test, google would be 65000*80000 times as fast or 5.2 billion times as fast.

specs for my machine are: celeron 413MHz, 64MB ram, 6GB Quantum 5400rpm HD (searched on second partition)
I know my computer is a POS but google's hardware and software need to be 5.2 billion times as effecient to work that fast. what I really want to know is does anyone have the hardware specs for google so I can drool all over them?
 

kami

Lifer
Oct 9, 1999
17,627
5
81
I thought they only stored meta tags and the first sentence or two from each site? Or is Google different?
 

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
AFAIK, google searches the entire site from it's cache and then assembles theresults based on how many sites link to that site and best relevance.

Although windows performance is horrible, I dont think that algorithms could push it up more than a magnitude or two which leaves the rest to hardware :)
 

kami

Lifer
Oct 9, 1999
17,627
5
81
hardware that would make any geek here moan in ecstasy :Q:Q haha..
 

kw3i

Golden Member
Jan 18, 2001
1,036
0
0
i think they are running 5 different servers for their search engine, all pentium2 350 with 194 megs of ram each. not sure about the hard drives. check them out here
 

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
some ballpark calculations:

assunming alorgiths could increase speed by 1000x
and current SCSI HD's are 10x as fast as my POS drive

then they would need 520,000 HD's in a RAID 0 array to get that kin of performance

add raid 1 for redundancy and that's 1.1 million drives which would mean 2 GB per drive.

or maybe they just have all of it in 28TB of QDR RAM?
 

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
what I really want to know was who was the lucky bastard who invested in HD/RM/solid state drive stocks just before google started and made a packet :)

or maybe it's just one big conspiracy theory (like the close door button on lifts is a placebo)
 

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
update on the situation, just did a search for "what kind of hardware is google runing" or where else but google.

this claims that google should be at 10,000 machines by now all running linux so maybe that has something to do with how fast it is running :)

 
Feb 10, 2000
30,029
66
91
Google has, from my perspective, rendered the other search engines obsolete. Not only is it lightning-quick, but it is by far the most intelligent in terms of its returning relevant results (with the exception of the recent "Dumb MFer" problem), and it does not clutter my screen with advertising. The new Google toolbar is great as well.
 

Double Trouble

Elite Member
Oct 9, 1999
9,272
103
106
Hmmmm..... maybe it's just me then. I'm not at all impressed with Google's search capabilities. It rarely gets me to the right pages when I do a search there. In fact, I've stopped using it ..... I wonder if it's just the subject matter that I'm searching or something like that. Dunno.
 

perry

Diamond Member
Apr 7, 2000
4,018
1
0
Google is great, even Yahoo thinks so. They dumped the Inktomi database a few months back in favor of Google for search results not in their directory.
 

iamwiz82

Lifer
Jan 10, 2001
30,772
13
81
Google is amazing, plus the have a sense of humor. Just look for the thread about george w bush, lol.
 

thEnEuRoMancER

Golden Member
Oct 30, 2000
1,415
0
71
I'm no fan of Bush at all but I don't find this prank funny. Besides it's bad for their business reputation.

edit: Google canceled the joke, it doesn't work any more
 

SinnerWolf

Senior member
Dec 30, 2000
782
0
0
You're comparing 2 totally unrelated procedures. In order to find a charcter string on your computer, windows looks at every file name on your computer and its extension. And then compares that string to its binary equivalent to find a match. This is a hardware search. When you use a search engine, it's purely software based. Websites typcially use meta tags, which is an area of a website's code that has a list of hot words. If any of these words match your search criteria, the search engine lists it. When a site updates it hotlist, it resubmits its info to the search engine. All that yahoo, lycos, etc..have to do is do a smart look through a series of categorized words to determine which matches your search string. Google and metacrawler use *most* of the independant search engines, and output their combined hits on a single sites.
 

thEnEuRoMancER

Golden Member
Oct 30, 2000
1,415
0
71
Hehe... The response to "Dumb MFer problem":

(Note: If you have arrived at this site through inappropriate references via a search engine, please be assured that we did not utilize this language in our site, our HTML, nor in our internet promotion of this site. What happened was the result of a malicious act by a third party and we have pursued remedies through the efforts of our staff and attorneys.)
 

arcain

Senior member
Oct 9, 1999
932
0
0
The two search types are very different.

The find command is very much a brute force algorithm. Your computer's file system stores no information about the contents of a file, so therefore when you searched for 'cat' it had to look in each and every file.

On the other hand, Google is (probably) very heavily optimized. They know what type of data they're storing (html/text). They know what type of searches are being performed (keyword or exact substring matching vs ranged types). Knowing this a database can be heavily optimized for the data. The database will be designed with this sort of index in mind.

The index in a database is similar to an index in the back of a book. Instead of flipping through a book looking for a word on every page (like Window's find command), you can look in the index, look up the word, and find the pages the word is on (like a database index).

And by having a great deal of RAM, it may be possible to have much of the index in memory. Also their data is probably also heavily distributed across many machines so that way the lookup can be done in parallel. And even though that info page says they don't cache search results, it seems like it was written when they were still doing research on it (as their goal was only to be able to handle a few hundred searches a minute (or a second, either way to slow for their current application), I'm betting they do cache their search results. Many search terms are probably fairly popular, and instead of constantly redoing the search, they can store the results, and when another person searches for the same string, the can just grab the results from memory.

And.. Google is very different from Metacrawler. Metacrawler send your search string to the various search engines (at least the last time I used it) and reorganizes the results. Google _is_ an "independent" search engine like Altavista or Lycos. They "crawl" the web themselves following the links on pages using programs and add the webpages to their database automatically. Google neither uses other engines to search nor do they require users to submit their pages.