Dunno, but we've probably all read it. The thing is, consoles still just don't have the performance, due to being behind the times by the time games are able to really utilize them. It all looks great when they start, and they get so much control, but then we have PCs with many times the power, by the times the games are coming out.cant find the link, but theres an interview with carmack on youtube (duh), about rage, where he goes into detal about what consoles does much much better than PC's and evens out the factor 10'ish flops due to this design .. and its actually in respect to shared resources.
But, it's not that at all. OpenGL, OpenCL, and DirectX are the architectures. AVX(2) can hide behind function calls and (H|G)LSL translators. The question is, will it be good enough?This is the same argument that people use for defending Bulldozer but it never works. You can't claim something is amazing and then ask the entire planet to gravitate towards the new ISAs and architecture.
Bulldozer is a whole different story. They lowered the IPC, crippled the SIMD performance, and didn't provide any hardware transactional memory technology.This is the same argument that people use for defending Bulldozer but it never works. You can't claim something is amazing and then ask the entire planet to gravitate towards the new ISAs and architecture.
AVX is clearly nothing more than the stepping stone to AVX2. The first AVX specification that Intel revealed, included support for FMA4. Then they changed it to FMA3, and then they moved it to AVX2. So the goal has always been to provide four times the SIMD throughput. Basically they wanted AVX2 all along but needed an intermediate step to get there.The same thing was said for AVX implementation, but where is it exactly? A few synthetic benchmarks and a handful of applications :/
Irrelevant. Fusion chips are limited by bandwidth. There won't be a whole lot of progress in the next few years. Of course the bandwidth can be increased, or they can add eDRAM, but each of these things increases the cost considerably.Come on!
Haswell - 2013
Llano - 2011
Cost isn't an issue. By getting rid of the IGP there's room for twice the CPU cores.Cost. Power. Form factor.
Cost isn't an issue. By getting rid of the IGP there's room for twice the CPU cores.
Cost isn't an issue. By getting rid of the IGP there's room for twice the CPU cores.
Power is a concern, however the solution could come in the form of AVX-1024. They could keep the 256-bit execution units, but feed them 1024-bit instructions over four cycles. The same amount of useful work per clock would be done, with four times fewer instructions going through the CPU's front-end. And that's a huge power saving.
So AMD won't have an answer against a homogenous many-core CPU with AVX2.
why not? they have AVX 1...there is something that will lock them?
Not in theory, no. But they've been investing into heterogeneous computing for years now and thereby sacrificing CPU performance. They'd have to abandon Fusion and make a 180 degree turn to focus on homogenous CPU performance and catch up with Haswell and its successors.why not? they have AVX 1...there is something that will lock them?
I think its more of an API abstraction issue.cant find the link, but theres an interview with carmack on youtube (duh), about rage, where he goes into detal about what consoles does much much better than PC's and evens out the factor 10'ish flops due to this design .. and its actually in respect to shared resources.
Ivy Bridge is a teeny tiny chip, bringing quad-core to mainstream. And in case you haven't noticed, even chips for mobile phones have started to go quad-core. And the multi-core revolution doesn't end there, it's only getting started. Hardware transactional memory makes multi-threading a whole lot more scalable, so expect to see more cores with every process shrink.You are still ignoring the part where in Notebook would need a significant sacrifice, especially dual cores...
No. Like I said before, gather support in AVX2 takes care of the most expensive graphics operations, in a generic way. Also, adding a few more instructions (like extending BMI1/2 to AVX) would hardly take extra space. And again those would be generic enough to be useful for a lot of other purposes.Also they still need to add dedicated units for graphics like what Cerb said.
Really? So providing a fourfold increase in core throughput is not part of an any well-coordinated plan? They've announced from the beginning that AVX would be extendable to 1024-bit.More importantly, that's not what Intel is doing.
Only a year ago most people thought GPGPU was going to go mainstream and the CPU would become less relevant. Then AVX2 was announced, obviously targeting SPDM processing. Suddenly opinions started to shift and now that NVIDIA has sacrificed GPGPU efficiency it's clear that's it's a dead end street.It may make sense for GPGPU but I doubt it makes sense for 3D graphics.
Wrong. It's the size of 3.5 cores, so let's round that up to a nice and even 4.Besides, you can only put two more cores in place of the Ivy Bridge iGPU...
What's are you trying to say?Have you considered what the power distribution is on Sandy or Ivy between the GPU and CPU?
Haswell will have two 256-bit floating-point units per core. And yes I know that executing 1024-bit instructions over 4 cycles results in the same throughput per unit. The point is that it lowers power consumption because the front-end has to deliver fewer instructions and there will be less switching activity in the schedulers.Also if you payed attention to Larabee. You would know the 512bit units combine 32bit instructions and then execute it in a single cycle. To compare, using 1024bit with 4 cycles or 256bit in one cycle in terms of graphics output is the same.
No, Larrabee performed rasterization in software. And they're not the only ones who've tried it: High-Performance Software Rasterization on GPUs. Rasterization is likely going to become programmable at some point.And a final add, even Larabee got a raster unit added. Go figure.
What's are you trying to say?
Haswell will have two 256-bit floating-point units per core. And yes I know that executing 1024-bit instructions over 4 cycles results in the same throughput per unit. The point is that it lowers power consumption because the front-end has to deliver fewer instructions and there will be less switching activity in the schedulers.
No, Larrabee performed rasterization in software. And they're not the only ones who've tried it: High-Performance Software Rasterization on GPUs. Rasterization is likely going to become programmable at some point.
Please stop guessing. Anand shows that Ivy Bridge uses 53.3 Watt at full CPU load, while during running Metro 2033 it consumes 58.7 Watt. And that DX11 game is very light on the CPU (especially when bottlenecked by the GPU). So the GPU consumes quite a bit of power.The GPU uses alot less power than the CPU for the same work. The CPU sits on something like 90% of the TDP budget.
What are you talking about? AVX2 can process 4 x 64-bit, 8 x 32-bit, 16 x 16-bit, or 32 x 8-bit per cycle. And it's perfectly suited for SPMD.Executing individual 16 or 32bit instructions in a 1024 AVX in 256bit parts is pretty useless. Try examine what AVX is used for today aswell as other SSE parts. They dont tend to me "GPGPU" related if you get my drift.
No, but I know better than Ars Technica. Larrabee does not have a hardware rasterizer.But again, I´m sure you know better than AMD/Intel![]()
Please stop guessing.
Sure, just ignore the L3 caches, which is a big benefit for CPU-only workloads. So in the end you get a crappy CPU and a crappy GPU.Wrong. It's the size of 3.5 cores, so let's round that up to a nice and even 4.
You said Haswell allows for the CPU to completely replace iGPUs, and they would stick more cores for that purpose. That's exactly the opposite happening in Haswell. Future chips are staying with Gen graphics, and the ones that aren't even Gen is going to/said to have move to Gen graphics.Really? So providing a fourfold increase in core throughput is not part of an any well-coordinated plan? They've announced from the beginning that AVX would be extendable to 1024-bit.
Simply said. It is for people who don't play games. If your a gamer then you disable the onboard and use your dedicated PCIe card. gl
That's the power consumption of just the GPU plane. It's not representative because there's also a lot of power being consumed by cache and RAM accesses. And even if that was included, the GPU still can't operate without running the graphics driver on the CPU. So that portion of the power consumption also has to be attributed to graphics.Really? Because you are not?
http://www.hardware.fr/articles/863-6/hd-graphics-4000-2500-consommation-3d.html
Yes, which is why I subtracted the idle power consumption. You have to look at the whole picture, and not isolate a particularly power efficient portion of the chip that is helpless by itself.Doesn't it say clearly on that AT review that the measurement is sytem power?
No. You don't have to increase the LLC cache size linearly with the number of cores. The dual-core Penryn had 6 MB of L2, while a quad-core Core i5 has 6 MB of L3. You can expect to see 8-cores with 6/8 MB in the future.The temporal and spacial locality of the LLC data doesn't change significantly with more cores. And something like graphics actually has highly regular access patterns.Sure, just ignore the L3 caches, which is a big benefit for CPU-only workloads.
So in the end you get a crappy CPU and a crappy GPU.
Sure, but that's the short-term future. I'm talking about a longer-term future. We'll get 1 TFLOP of computing power out of 15 Watt CPU sooner than you might realize (but still several generations after Haswell). And it would be far more useful if that was fully generic instead of having a limited programming model. Also note that currently we're still stuck with lightly threaded software because of a lack of hardware transactional memory. But Haswell brings us TSX. Of course it won't make the software heavily multi-threaded overnight, but gradually it will become better to have more small cores than few big cores.By the way, the 15W Haswell SKUs are all dual core.
Yes, but the graphics performance expectations of a 15 Watt part will also be significantly lower than that of a 55 Watt part. And that applies to whether you have heterogeneous or homogeneous graphics.And the indication is that the iGPU(not a CPU wanting to be a GPU frankenstein), is more buffed up than ever. And its obvious the 15W parts are going to be pushed really hard because of Ultrabooks.
And that's probably exactly what's going to happen.
Dual cores - 15-17W
Quad cores - 35/45/55W
No, I said it's a significant step toward that. It marks the end of GPGPU, but not the end of IGPs, yet.You said Haswell allows for the CPU to completely replace iGPUs, and they would stick more cores for that purpose.
hey if you guys aren't busy i thought we could compare a bunch of theoretical constraints in such a way that they sound worthwhile enough on paper to challenge real world paradigms hardened by practical experience and conventional wisdom, and then share condescending soliloquies with each other on why our game-changing concept hasn't been implemented yet...
NVIDIA already performs all blending in the shader for Tegra (see the GL_NV_shader_framebuffer_fetch extension). And that's a power-restricted mobile chip!I think that programmable blending and programmable depth could *possibly* eventually be made more efficient than hardware blending/depth.
