Sandy bridge & Llano bad for gamers?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Nv for cuda is the c language . NV didn't include support for open cl till late in the game same as AMD . Your knowledge on this subject is pure fanboy play . Nothing on the real hardcore work done by the companies pushing C++ Most all the vector stuff from companies like Imagination and many others . is done using the C++ programming language. Intel is ahead of the game here not as you say behind .
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
All those millions of GPUs you talk about from NV and AMD will be scrapped by the time we see real progress with open CL . Intel is working in the right direction on this . I have a feeling you and many of people like yourself are going to hammered by Intels SB capabilities . Intel has been very clever here. This is why I have told you guys a million times open CL or DX11 means nothing to Intel as they don't need it . They been working on it for years now . You just don't get it do ya.
 
Last edited:

alyarb

Platinum Member
Jan 25, 2009
2,425
0
76
you say intel doesn't need OpenCL yet you also say they have been "working on it" longer than anyone (and that may be; so why don't they have any shipping IGPs that support it?). You just look through intel pdf's for the keyword pertaining to your trollish argument, take a couple facts out of context so you can work with them in putting together some of the most shamelessly biased unintelligible nonsense troll posts I have ever seen. it takes more than a consortium and runtime library to introduce OpenCL. You need useful hardware and useful programs. Right now all we have are the Radeon 5000 series and no software.

i'm going to just put you on ignore, but i will do you a favor.

http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

I know this wiki article is written in English but perhaps you can use google to translate it into your language. AVX, like SSE, is for vectors-on-x86. building a GPU that could use AVX efficiently would look something like Larrabee and it would be up against solving the same problems intel ran into. Doing vectors in this way on GPUs is totally inefficient and goes completely against the massively parallel scheme GPUs follow in their design. having a discrete 256-bit wide register per shader would make the shader enormous, dramatically limiting how many of them you can put on your die.

If you truly believe AVX has a chance at being used to leverage a conventional (or even nonconventional) GPU, it simply shows you have no idea how how the hardware works or what analogous technologies are available and better suited to the many-shader paradigm. There is no way for me to be convinced that you understand the implications behind the concept of large vector registers on GPUs (with respect to die size as well as throughput), so this argument simply has nowhere to go. You have no clue what you are talking about and yet you are still more interested in being correct than being corrected.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Nemesis-You seriously need to put all your thoughts in one post, not one after another. Nobody will take you seriously, and its kinda annoying.

Intel has many projects and experiments going that may or may not be scrapped. Hell, they even have a patent related to Clustered Multi-threading similar to Bulldozer. You cannot predict what they are doing using simple patents and presentations, unless they say otherwise.

We can be fairly certain they will be working on whatever API is needed for Larrabee, be it DirectX, OpenGL, or GPGPU like OpenCL. Code flexibility is one thing they seem to be touting about Larrabee, so they definitely can do it.

Will we ever see its variant as a GPU? We don't know that yet. I'm betting at the moment that not even they are sure about that. It might just serve as a many-core project that will merely stem the GPGPU tide rather than going into home turf like graphics.

We'll see whether their decision was right when Larrabee ever surfaces. Who knows? They might beat everyone out the door and show there's more than one way to do graphics, and redefine what's called "efficiency". Or they might flop.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
Intel Sandy Bridge : No FMA (Fused Multiply Add) support. Developers plea Intel to include it.

http://software.intel.com/en-us/forums/showthread.php?t=61121&o=d&s=lr

For the record, GPU's use FMA since years. The new FP unit of AMD's BD comes with FMA, and you can bet that AMD's OpenCL kit that runs on both CPU/GPU will use it.

Intel AVX - GFLOPS numbers advertised by Intel - Developer comments.

"I am working with both GPU and CPU code, and I can tell you that many sub-$100 cards will beat Intel's $3200 Dual Quad Core Xeon top of the line system. Going back to the peak GFLOP/s issue, it's actually much harder to achieve good performance with SSE."

http://software.intel.com/en-us/forums/showpost.php?p=109285

SiSoft Sandra OpenCL benchmark 2010 - CPU vs. GPGPU: Arithmetic Performance.

Intel Core i7 965 (CPU)
125 Mpixels/s 961 kpixels/W 125 kpixels/$

nVidia GeForce 9600 GT (GPU).
194 Mpixels/s 2042 kpixels/W 1940 kpixels/$

http://www.sisoftware.net/?d=qa&f=cpu_vs_gpu_proc&l=en&a=

Adobe goes GPGPU with CS5,Premier,PS.

"One other aspect to this is the rate of innovation on GPU development. While CPU's from Intel and AMD continue to evolve and grow, the rate of innovation has slowed down dramatically. This hasn't been the case for GPU development. We continue to see dramatic leaps in performance every 12-18 months. The Quadro XXXX of today will be the GeForce of tomorrow and that means that your performance on a given system will be able to develop rapidly over time at a much lower cost than ever before."

http://blogs.adobe.com/genesisproject/2009/11/technology_sneek_peek_adobe_me.html

Adobe showcases Mercury Engine.

http://www.youtube.com/watch?v=sylAonfVp9k
http://www.youtube.com/watch?v=nrE9vXUfgvs


Thats cold hard reality, with Fermi not even released yet and AMD's Northern Islands GPU out in 2011. This doesn't mean squat ofcourse that GPUs will conquer the universe, far off speculation, performance CPU's are very much needed for the long run, not all code is optimal for taking advantage the GPUs. But for those cases that do, they will rule the roost.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Grimpr, in code that actually matters, like used in HPC applications and such, GPUs don't even do 1/3 of theoretical, and that's for single precision. For double precision, you can cut by another magnitude or so. Problem is, with vast workloads CPUs run, it doesn't care about FP.

In contrast, CPUs will achieve 80-90% of theoretical DP performance. The differences are so great, and you are exaggerating GPU advantages so much, its not even funny. It's akin to comparing a drag racer to a bus. Sure, they are good in their own right, but if you cross and change around the workload, it'll flop. Completely.

AVX is a vector code for General Purpose calculations. It's not designed to accelerate GPU code, though it might be marketed as so, once in a while. Actually, if you take out the "256 bits" part, its not even about FP either, since it'll also try to make other code run faster.

We complain everyday about how our apps don't care about beyond 4 threads, and some of the multi-threaded apps don't even scale at all beyond 16 threads or so. Cinebench for example has that limit. Or how for transcoding GPU accelerated apps still degrade quality noticeably.

Neither GPU or CPU is close to crossing territory... yet.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
All those millions of GPUs you talk about from NV and AMD will be scrapped by the time we see real progress with open CL. Intel is working in the right direction on this . I have a feeling you and many of people like yourself are going to hammered by Intels SB capabilities . Intel has been very clever here. This is why I have told you guys a million times open CL or DX11 means nothing to Intel as they don't need it . They been working on it for years now . You just don't get it do ya.

SandyBridge is still just a general purpose CPU with all the x86 legacy bloat from the last 30 years. Now are you trying to say that SB will have stronger compute capability than video cards that were developed specifically to accelerate highly parallel tasks?

If that is the case, then you are daydreaming.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
Last edited:

lopri

Elite Member
Jul 27, 2002
13,329
709
126
We complain everyday about how our apps don't care about beyond 4 threads, and some of the multi-threaded apps don't even scale at all beyond 16 threads or so. Cinebench for example has that limit. Or how for transcoding GPU accelerated apps still degrade quality noticeably.
New Cinebench is aware of up to 48 cores. :)

To the gist of your argument: I don't think anyone is asserting imminent death of conventional CPUs here. But the consensus seems to be,

1) CPUs aren't getting much faster as they used to.
2) However, for a lot of applications, it doesn't matter because modern CPUs are already plenty fast for what they're meant to do. (or differently put, modern CPUs are fast enough to interact to humans who operate them)
3) In this circumstance, more and more focus is being given to visuals and concurrent workloads. And GPUs tend to excel at those.
4) Larrabee was/is Intel's answer to this developing trend.

You need not look further than HD video playback situation where a clip that pegs a dual-core CPU to the max is run effortlessly by a DXVA-enabled GPU. The dilemma of Intel is;

1) It can't sit by and let things go this way.
2) If it were to follow the trend through what it's good at (x86), it will cut into its own margins.

You see, I'm not sure whether a successful Larrabee would be what Intel really wanted. A gigantic leap in parallel performance which can also execute x86 codes? It could potentially jeopardize a huge portion of lucrative Xeon market which no doubt Intel doesn't want to lose.
 

alyarb

Platinum Member
Jan 25, 2009
2,425
0
76
there's nothing stopping intel from introducing a larrabee-haswell hybrid similar to Llano if they really wanted to capitalize on heterogeneous. Larrabee is a great little architecture to have in your portfolio. It sucks for graphics but frankly who the hell really cares; DP FPU performance was really, really decent for its time. If they were prepared to ship 32-core parts on 45nm, they could easily do a coprocessor on 28nm *and* it would be purebred x86. it's still a risk for them because like inteluser said, with the exception of some basic apps GPUs and CPUs have not crossed paths yet.

you might even argue that Llano is too ahead of its time and may fail to deliver the massive theoretical GPGPU boosts that our math tells us we should get. there simply are no good GPGPU apps right now, and what if they never come (or take too long)? then there clearly is no point to having such a brawny IGP.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Process technology launches are tied to product launches, and therefore, can't be done sooner. Intel might actually increase its process technology lead a bit further on 32nm.

Well, I think the mistake of my post was claiming "32nm" was the limiting factor for the production of Llano. No doubt, other technologies need to be perfected first.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Intel Sandy Bridge : No FMA (Fused Multiply Add) support. Developers plea Intel to include it.

http://software.intel.com/en-us/forums/showthread.php?t=61121&o=d&s=lr

For the record, GPU's use FMA since years. The new FP unit of AMD's BD comes with FMA, and you can bet that AMD's OpenCL kit that runs on both CPU/GPU will use it.

Intel AVX - GFLOPS numbers advertised by Intel - Developer comments.

"I am working with both GPU and CPU code, and I can tell you that many sub-$100 cards will beat Intel's $3200 Dual Quad Core Xeon top of the line system. Going back to the peak GFLOP/s issue, it's actually much harder to achieve good performance with SSE."

http://software.intel.com/en-us/forums/showpost.php?p=109285

SiSoft Sandra OpenCL benchmark 2010 - CPU vs. GPGPU: Arithmetic Performance.

Intel Core i7 965 (CPU)
125 Mpixels/s 961 kpixels/W 125 kpixels/$

nVidia GeForce 9600 GT (GPU).
194 Mpixels/s 2042 kpixels/W 1940 kpixels/$

http://www.sisoftware.net/?d=qa&f=cpu_vs_gpu_proc&l=en&a=

Adobe goes GPGPU with CS5,Premier,PS.

"One other aspect to this is the rate of innovation on GPU development. While CPU's from Intel and AMD continue to evolve and grow, the rate of innovation has slowed down dramatically. This hasn't been the case for GPU development. We continue to see dramatic leaps in performance every 12-18 months. The Quadro XXXX of today will be the GeForce of tomorrow and that means that your performance on a given system will be able to develop rapidly over time at a much lower cost than ever before."

http://blogs.adobe.com/genesisproject/2009/11/technology_sneek_peek_adobe_me.html

Adobe showcases Mercury Engine.

http://www.youtube.com/watch?v=sylAonfVp9k
http://www.youtube.com/watch?v=nrE9vXUfgvs


Thats cold hard reality, with Fermi not even released yet and AMD's Northern Islands GPU out in 2011. This doesn't mean squat ofcourse that GPUs will conquer the universe, far off speculation, performance CPU's are very much needed for the long run, not all code is optimal for taking advantage the GPUs. But for those cases that do, they will rule the roost.

Ya Igor is pissed about no FMA . He has been explained many times to him why no FMA.

As for open cl its a C/C++ programming language . Intel doesn't need to work on software for c/C++ . Without programms for CL none needs to work on it . Intel is instead is working on compilers and jit compilers. Now Compilers everyone can use . But the jit compilers are exclusive to Intel . Intel is going about this the correct way. Intels iGPU on i3/i5 is already a gpgpu and intell is using also. Without DX11. Same as cuda works without DX11. All 3 companies need a run time cimpiler for open CL Intel is just way ahead of others . Intel doesn't have to rush on larrabee. You think they didn't know Fermi was going to be late ? They new and decided to go with 32nm cores and get software and drivers ready with Larrabee 1 . It was actually a smart move by Intel. But Open CL isn't something Intel is afraid of . Intel has a fondness for C++. Intel wants to use Open CL both ways on the 86 cpu and what ever gpu they use . Why have idle transitors when its more efficient to have a total package running efficiently .
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
SandyBridge is still just a general purpose CPU with all the x86 legacy bloat from the last 30 years. Now are you trying to say that SB will have stronger compute capability than video cards that were developed specifically to accelerate highly parallel tasks?

If that is the case, then you are daydreaming.

No I am not . Is what I am saying is Intel wants to use the whole processor as efficiently as possiable . Offloading gpu work to the cpu and cpu work to the gpu using all the transitors efficiently. To do that you need great compilers and thats what intel is consintrating on . .Sure the compilers can be used by all . But the jit compilers not all will get those and thats were intel is heading .
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
New Cinebench is aware of up to 48 cores. :)

To the gist of your argument: I don't think anyone is asserting imminent death of conventional CPUs here. But the consensus seems to be,

1) CPUs aren't getting much faster as they used to.
2) However, for a lot of applications, it doesn't matter because modern CPUs are already plenty fast for what they're meant to do. (or differently put, modern CPUs are fast enough to interact to humans who operate them)
3) In this circumstance, more and more focus is being given to visuals and concurrent workloads. And GPUs tend to excel at those.
4) Larrabee was/is Intel's answer to this developing trend.

You need not look further than HD video playback situation where a clip that pegs a dual-core CPU to the max is run effortlessly by a DXVA-enabled GPU. The dilemma of Intel is;

1) It can't sit by and let things go this way.
2) If it were to follow the trend through what it's good at (x86), it will cut into its own margins.

You see, I'm not sure whether a successful Larrabee would be what Intel really wanted. A gigantic leap in parallel performance which can also execute x86 codes? It could potentially jeopardize a huge portion of lucrative Xeon market which no doubt Intel doesn't want to lose.

This is true on the opposite sides as well. Majority of the people use iGPUs, and when the performance becomes good as in Llano where you can play most games in high that will be even more common. Even IGP generation reduces amount of people that need discrete GPUs.

Actually, HD video playback seems overrated. With a Core 2 Duo E6600 and G965 setup, I never seen higher than 30-35%, and on the i5 661, it would be half that. Only problem with that is higher power, but same is true for GPUs again. You swap a lowest-end dedicated for any IGP and even idle power plummets.

I do admit that the SGEMM/DGEMM numbers are impressive, but highly believe that no one is really magnitudes off each other. The amount of die space you spend on is the performance you'll get.

This comes back to the beginning of what I mentioned. With lots of fpses nowadays being merely console ports, is "Desktop PC" in general dying? Developers are too scared nowadays to spend money and time debugging on multiple variety available on PC and port for consoles instead which is a single, never changing platform.

I'd think this is why AMD changed their strategy around following Nvidia's footsteps of ever growing GPUs and integrated the two instead. Their dedicated cards will be usurped by Fermi soon, and Bulldozer is probably not going to take the overall CPU crown(multi-thread, maybe close). But they will satisfy the majority and it'll cost less for them to manufacture and be less risk.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
You see, I'm not sure whether a successful Larrabee would be what Intel really wanted. A gigantic leap in parallel performance which can also execute x86 codes?

Would ATI GPUs be able to do the same thing?

Bulldozer is probably not going to take the overall CPU crown(multi-thread, maybe close). But they will satisfy the majority and it'll cost less for them to manufacture and be less risk.

I don't know if Bulldozer will do so well with "the majority" if IPC/frequency not a huge jump over Sandy Bridge? Server market is a different story I'm sure.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I actually meant that by how most people won't need the extra performance. Smaller die means they can make it cheaper and therefore easier to sell it cheaper. Intel's process is actually not that great on density. Of course, Intel actually uses the same process that they put in presentations in products while other companies may not do so.

It might actually give them a chance to be competitive on the power front too.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Llano CPU may not end up being faster than Phenom II (due to lack of lack L3 cache), but a person will be able to buy "Bulldozer" without a fused GPU.

CPU's with integrated memory controller like AMD's Phenom II architecture aren't very dependant of the L3 cache, that's why the Phenom X4 620 isn't far behind compared to the Phenom X4 945. Bulldozer might be a different story.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
No I am not . Is what I am saying is Intel wants to use the whole processor as efficiently as possiable . Offloading gpu work to the cpu and cpu work to the gpu using all the transitors efficiently. To do that you need great compilers and thats what intel is consintrating on . .Sure the compilers can be used by all . But the jit compilers not all will get those and thats were intel is heading .

I agree that Intel makes the best compilers and SB will most likely be a very efficient and powerful piece of hardware. From what I read, the IGP in SB shares the L3 (8MB?) cache with the CPU, so that would boost performance also.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
I actually meant that by how most people won't need the extra performance. Smaller die means they can make it cheaper and therefore easier to sell it cheaper. Intel's process is actually not that great on density. Of course, Intel actually uses the same process that they put in presentations in products while other companies may not do so.

Intel does better with their process density than AMD don't they? I mean check out the CPU part of i3 Clarkdale for example, we get 2 cores+4MB L3 cache and a die size of only 81mm^2, that's pretty tiny.


It might actually give them a chance to be competitive on the power front too.

I do believe that with Bulldozer (and Bobcat) AMD will finally catch up to Intel on power draw. Intel already use HKMG and power gating in their CPUs, features that AMD will have with Bulldozer, plus the shrink to 32nm of course.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Yes, but on Clarkdale, its 32nm, so its not comparable with AMD's products.

I believe Llano will perform similar to Deneb even lacking L3. Not only L2 cache is doubling to 1MB per core, there will be some minior improvements to the CPU core. L3 affects less than 10% of performance in the case of AMD anyways.

I have to say I'm impressed with AMD's ability to survive against such impossible odds. They are taking risks with designs like Bulldozer at a time when others are afraid to do and take only safe paths. Major PC game developers making console-only games, movie studios making rehash of same movies(sequels, prequels, "trilogies"), it can't be good for the economy.

Innovation is what allows excitement to occur and inspire people. It is risky, but if nobody took risks we wouldn't have even had semiconductors in the first place. Bulldozer with the CMT architecture and implementing first integrated memory controller for a mass market CPU is innovation. It might not have been done the first time, but previous versions were never mass market.

"Llano" and hybrid approaches would be another risk, hoping such will survive the possible demise of large size CPUs and GPUs.

Intel is impressive by the fact that they can supply the entire market with semiconductors while spearheading cutting-edge process technology, but they haven't been as impressive with their CPUs. Those that think otherwise have absolutely NO clue how much better they really are.

Pentium 4 may have been a marketing and product failure but it brought radical ideas like Execution Trace Cache, replay, and aggressive speculation to feed its extraordinary length pipeline. We could probably say the experience of Pentium 4 led to success of Core 2, as they definitely "woke up" and learned from their mistakes.

AMD has been mostly playing safe since the Athlon and the Athlon 64 and that might have hurt them a bit, but at the least I think they'll survive.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
http://www.anandtech.com/video/showdoc.aspx?i=3740&p=5

If you read this article it sounds like AMD may have future plans for synchronizing discrete GPUs through something called "sideport". Note this sideport sounds much different than the one involved with IGP memory.

Could this sideport also be used on the AMD Fusion IGP for the purposes of load balancing when a discrete video card is present?
 
Last edited:

Kuzi

Senior member
Sep 16, 2007
572
0
0
Sideport was introduced back in 2008 with ATI's 4xxx series cards, it supposedly boosted bandwidth between GPUs in dual GPU cards such as the 4870 X2. But for various reasons ATI never enabled it. You can read more about it here at this article:

http://www.anandtech.com/video/showdoc.aspx?i=3372&p=3

From the article, it seems that Sideport provides 10 GB/s bandwidth (5 GB/s in each direction), and I think for Llano AMD will keep using HyperTransport which is much faster. The current HT version (3.1) provides 51.6 GB/s bandwidth.
 
Last edited:

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Sideport was introduced back in 2008 with ATI's 4xxx series cards, it supposedly boosted bandwidth between GPUs in dual GPU cards such as the 4870 X2. But for various reasons ATI never enabled it. You can read more about it here at this article:

http://www.anandtech.com/video/showdoc.aspx?i=3372&p=3

From the article, it seems that Sideport provides 10 GB/s bandwidth (5 GB/s in each direction), and I think for Llano AMD will keep using HyperTransport which is much faster. The current HT version (3.1) provides 51.6 GB/s bandwidth.

I think that the Sideport was to pass communication stuff like driver query/communication, frame syncronization, etc, but never meant to pass graphic data since the bandwidth is too little for that. The RV770 memory bus is just much faster than any CPU bus in the market currently.

Hyper Transport is more elaborate because besides of CPU to CPU communication, it also connects to the RAM sticks and requires high bandwidth for the sake of performance in memory transactions like reads and writes, a GPU rarely reads what it writes.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
This is true on the opposite sides as well. Majority of the people use iGPUs, and when the performance becomes good as in Llano where you can play most games in high that will be even more common. Even IGP generation reduces amount of people that need discrete GPUs.

Actually, HD video playback seems overrated. With a Core 2 Duo E6600 and G965 setup, I never seen higher than 30-35%, and on the i5 661, it would be half that. Only problem with that is higher power, but same is true for GPUs again. You swap a lowest-end dedicated for any IGP and even idle power plummets.

I do admit that the SGEMM/DGEMM numbers are impressive, but highly believe that no one is really magnitudes off each other. The amount of die space you spend on is the performance you'll get.

This comes back to the beginning of what I mentioned. With lots of fpses nowadays being merely console ports, is "Desktop PC" in general dying? Developers are too scared nowadays to spend money and time debugging on multiple variety available on PC and port for consoles instead which is a single, never changing platform.

I'd think this is why AMD changed their strategy around following Nvidia's footsteps of ever growing GPUs and integrated the two instead. Their dedicated cards will be usurped by Fermi soon, and Bulldozer is probably not going to take the overall CPU crown(multi-thread, maybe close). But they will satisfy the majority and it'll cost less for them to manufacture and be less risk.

I see where are you getting at regarding Games and fully agree. A huge part of the PCs raison d'etre is Games, take that away and you transform it into a lifeless office machine. A huge part of the PC's curse is Microsofts Direct3D pipeline and Xbox360. ATI & Nvidia are playing safe with Microsoft, graphics chip contracts in next gen consoles is a good business for both of them. The only company which doesnt care and starts fresh is Intel, If and when Larrabee arrives, executes perfectly on the software front, attracts some good pc game developers and ressurects one of the prime reasons the PC is known for, i will gladly support it. Besides that, a 3rd major player such as Intel in PC graphics is more than welcomed.