How many Sandy-Bridges to run IBM's Watson?

GundamF91

Golden Member
May 14, 2001
1,827
0
0
Just curious how many Sandy Bridges it'd take to run Watson the AI program? It'd be nice to have a home PC that can answer trivia questions.
 

IGemini

Platinum Member
Nov 5, 2010
2,472
2
81
Short answer: a lot.

From wiki entries on Watson and Power7:

http://en.wikipedia.org/wiki/Watson_%28artificial_intelligence_software%29

...Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. The POWER7 processor's massively parallel processing capability is an ideal match for Watsons IBM DeepQA software which is embarrassingly parallel (that is a workload that executes multiple threads in parallel)
http://en.wikipedia.org/wiki/POWER7

POWER7 has these specifications:[6][7]

  • 45 nm SOI process, 567 mm2
  • 1.2 billion transistors
  • 3.0 – 4.25 GHz clock speed
  • max 4 chips per quad-chip module
    • 4, 6 or 8 cores per chip
      • 4 SMT threads per core (available in AIX 6.1 TL05 (releases in April 2010) and above)
      • 12 execution units per core:
        • 2 fixed-point units
        • 2 load/store units
        • 4 double-precision floating-point units
        • 1 vector unit supporting VSX
        • 1 decimal floating-point unit
        • 1 branch unit
        • 1 condition register unit
    • 32+32 kB L1 instruction and data cache (per core)[8]
    • 256 kB L2 Cache (per core)
    • 4 MB L3 cache per core with maximum up to 32MB supported. The cache is implemented in eDRAM, which does not require as many transistors per cell as a standard SRAM[5] so it allows for a larger cache while using the same area as SRAM.
This gives the following theoretical performance figures (based on a 4.04 GHz 8 core implementation):

  • max 33.12 GFLOPS per core
  • max 264.96 GFLOPS per chip
 
Last edited:

drizek

Golden Member
Jul 7, 2005
1,410
0
71
In other words, you can have a Watson in your house, but there probably won't be any room left for you.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
from wiki:

Watson is run by two units of five racks, in which reside 90 IBM Power750 servers (each server having 4 CPUs with 8 cores each, where each core has 4 hardware threads) and its RAM is "over 15 TB".

90 x Power750 servers x 4 CPUs sockets (8core each) = 90 x 4 x 8 = 2880 "core"'s
over 15 TB of ram speaks for itself.
Watson doesn’t rely on data stored on hard drives because they are too slow to access.

32 POWER7 processor cores running at 3.55 GHz inside 1 Power750 server.

Youd have to find someway to compair a POWER7 with a SandyBridge core.
 

BoomerD

No Lifer
Feb 26, 2006
66,285
14,704
146
Short answer: a lot.

From wiki entries on Watson and Power7:

http://en.wikipedia.org/wiki/Watson_%28artificial_intelligence_software%29


...Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. The POWER7 processor's massively parallel processing capability is an ideal match for Watsons IBM DeepQA software which is embarrassingly parallel (that is a workload that executes multiple threads in parallel)
http://en.wikipedia.org/wiki/POWER7

Quote:
POWER7 has these specifications:[6][7]

* 45 nm SOI process, 567 mm2
* 1.2 billion transistors
* 3.0 – 4.25 GHz clock speed
* max 4 chips per quad-chip module
o 4, 6 or 8 cores per chip
+ 4 SMT threads per core (available in AIX 6.1 TL05 (releases in April 2010) and above)
+ 12 execution units per core:
# 2 fixed-point units
# 2 load/store units
# 4 double-precision floating-point units
# 1 vector unit supporting VSX
# 1 decimal floating-point unit
# 1 branch unit
# 1 condition register unit
o 32+32 kB L1 instruction and data cache (per core)[8]
o 256 kB L2 Cache (per core)
o 4 MB L3 cache per core with maximum up to 32MB supported. The cache is implemented in eDRAM, which does not require as many transistors per cell as a standard SRAM[5] so it allows for a larger cache while using the same area as SRAM.

This gives the following theoretical performance figures (based on a 4.04 GHz 8 core implementation):

* max 33.12 GFLOPS per core
* max 264.96 GFLOPS per chip


And is STILL can't run Crysis with full eye candy effects...:p
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
The single thread performance of the chips is about equal to Penryn at the same clock speed. What makes it impressive in addition to the high single threaded performance, that their scalability, memory controller, and multi-threading capabilities are unparalleled. The SMT in Power 7 can gain 20-30% performance in situations where Intel's version can only muster 5-10%.

-Memory bandwidth of the Opteron 61xx
-Scalability, RAS, and memory capacity of Xeon 7500
-Per core performance on par with client processors that were previously the undisputed champions

The Jeopardy contest itself didn't seem anything special though. Sure the computer was powerful, but it was only winning because it could press the button faster(If I had the button that could be activated by my mind, and answers fed directly through my "circuits" I would have responded fast too).
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
The single thread performance of the chips is about equal to Penryn at the same clock speed.

That seem a little low for the units and cache the power7 sports?

I mean they quote 33gFlops per core. That's on par with sandy bridge AVX and about 7 times Penryn.
 

GundamF91

Golden Member
May 14, 2001
1,827
0
0
The single thread performance of the chips is about equal to Penryn at the same clock speed.....

The spec says Watson backend has 2880 POWER7 processor cores, and my Penryn has 4 physical cores. Assuming they are equivalent at around 3.5Ghz. Watson could be 720 times more powerful, but probably quite a bit lower due to nature of parallel computing.

I guess Watson isn't coming to my house any time soon.
 
Last edited:

Diogenes2

Platinum Member
Jul 26, 2001
2,151
0
0
.....
The Jeopardy contest itself didn't seem anything special though. Sure the computer was powerful, but it was only winning because it could press the button faster(If I had the button that could be activated by my mind, and answers fed directly through my "circuits" I would have responded fast too).
Sort of the point, no?
 

Insomniator

Diamond Member
Oct 23, 2002
6,294
171
106
The Jeopardy contest itself didn't seem anything special though. Sure the computer was powerful, but it was only winning because it could press the button faster(If I had the button that could be activated by my mind, and answers fed directly through my "circuits" I would have responded fast too).

:rolleyes:
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
That seem a little low for the units and cache the power7 sports?

I mean they quote 33gFlops per core. That's on par with sandy bridge AVX and about 7 times Penryn.

The server benchmarks of Power 7 is faster than Nehalem-EX by a factor equal to the clock speed differences. They are well multi-threaded benchmarks, and SMT for Power 7 is superior. There's what, 5-10% difference in single threaded IPC between Nehalem and Penryn?

FPUs are a different story.

Sort of the point, no?

Not really. The point was to demonstrate AI capability. Response times don't tell that.
 

PreferLinux

Senior member
Dec 29, 2010
420
0
0
I think we can conclude it is roughly equivalent to 2000 SB CPUs with 8 GB of RAM each, probably. (My i5 2500K at stock is around 48 GFlops.)
 

GundamF91

Golden Member
May 14, 2001
1,827
0
0
.... The point was to demonstrate AI capability. Response times don't tell that.

The speed of Watson in researching the answer also factor into his response time. If Watson couldn't complete the query as fast, then it could not beat a human buzzer, despite Watson's quick electrical buzzing. We saw several times where Watson had the right answer but was beaten by human buzzer (who also answered correctly). I think IBM should have overclocked Watson by 20% to ensure buzzing 1st every time. :p
 

Spikesoldier

Diamond Member
Oct 15, 2001
6,766
0
0
if watson really wanted to play dirty, they would program it to push the button as fast as it can every time.

during the countdown to give the answer, im sure that watson could compute and spit out the answer in time.

add that to the advantage of watson getting the question immediately vs having to read it or listen to it like the humans makes it even more unfair.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
In other words, you can have a Watson in your house, but there probably won't be any room left for you.

Or power. I don't think you can get three phase power in your home from the utility company. Hospitals and jails have it though. You'd probably visit both after dealing with all the idiosyncrasies of supercomputer maintenance, however! :biggrin:
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,070
3,575
126
This @ 4.5ghz would probably do it. ;)

PJs_s&
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,070
3,575
126
What is it anyway?

From that picture?

Theres only a few things it can be.. either a beckton cluster, or a Magoney on overdrive. :biggrin:

But im saying 256 SB cores should do it. 300 if you want to keep them at stock.

Because thats 256 cores but 512 Threads with HT, keeping the faster QPI interlink on the current system with quad channel DDR3 which is suposed to be out for LGA2011 vs the Power 7.

:biggrin:

So assumption LGA2011 will be up to 10 cores / 20 meg cache cpu's.. umm.. thats 26 2011 cpu's one would need, with a total of OMFG 460 megs in cache. (were almost 1/2 at a gigabyte in cache)

4 cpu's / planer board ~ 7 racks. or 280 LGA2011 cores /w 560 working threads + 560megs of cache. :p

And you wouldnt need 3 phase power to run 7 x 2011 systems. :p

Or am i majorly underestimating the power of a Power7?
 
Last edited:

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
LOL if you think that a desktop platform can compete with that you're sorely mistaken. Sure you have the flops but nowhere near the bandwidth to keep them busy - in real time. This is why distributed computing works so well. If you need instantaneous processing power you need tremendous bandwidth at low latency which a pc platform cannot - nor was designed - to deliver.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
The world's first 256-core bathroom heater to run window's taskmanager?

If those were Prescott cores I'd say with Linpack that would be a formidable fog fighter at the main approach at O'Hare! :biggrin:
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
Sort of the point, no?

Yes, that is exactly the point. Running the exact same software with access to the exact same information on something like a Pentium 4 would get its ass kicked because it would be too slow. Watson is a good jeopardy contestant because it is faster than a human.