- Oct 9, 1999
- 5,168
- 3,786
- 136
With Haswell on the horizon I finally got around to taking a deep dive into Anand's architecture article from last year. First I have to say it's an amazing read. Anand is in my opinion the best tech writer in the world. He has a gift. He's always been good at it and he gets better all the time. There is so much between the lines...
Anyway I have a few nit-picky questions I'm hoping the CPU gurus around here an help me with.
1. On page two "Platform Retargeting" Anand writes
"There will be four client focused categories of Haswell, and I can only talk about three of them now. There are the standard voltage desktop parts, the mobile parts and the ultra-mobile parts: Haswell, Haswell M and Haswell U."
This seems to indicate the three Haswells are Haswell, Haswell M, and Haswell U
Then a little farther down that page he writes
"It's the Haswell U/ULT parts that brings about the dramatic change. These will be a single chip solution, with part of the voltage regulation typically found on motherboards moved onto the chip's package instead. "
This seems to imply that Haswell U/ULT are both the 3rd Haswell because he said he couldn't discuss the 4th Haswell. Or perhaps he is telling us something?
But then on Page 4 "The Fourth Haswell" he writes
"Just before this year's IDF Intel claimed that Haswell ULT would start at 10W, down from 17W in Sandy/Ivy Bridge. Finally, at IDF Intel showed a demo of Haswell running the Unigen Heaven benchmark at under 8W:"
Since this section is titled the 4th Haswell are we to assume that the 4th Haswell is the ULT part that is sub 10W? That one that he couldn't discuss? Or perhaps he's leaving us to infer that there will be desktop parts, "Haswell," mobile parts (at higher clocks than current mobile CPUs at this TDP) in the standard 35W range "Haswell M," mobile parts for ultrabooks in the 17W range "Haswell U," and the 4th Haswell in the sub 10W range "Haswell ULT?"
I'm thinking that Anand is "saying" without "saying" that the 4th Haswell is an ultralow voltage part intended for tablets and perhaps even smaller devices. He is making that point not with his words but with the Intel IDF demo. This leads me to suspect there is still a surprise in store for us as to how low (and small) Haswell will go.
2. Where exactly does the "front end" of the execution engine end and the "back end" begin? Or more to the point, is the Decode Queue the front end or the back end?
3. "Haswell's Wide Execution Engine" page, Anand writes
"Simply being able to pick from more instructions to execute in parallel is one thing, we haven't seen an increase in the number of parallel execution ports since Conroe."
Anand is always deliberate in his writing and I'd like to know what he was getting at? I'm pretty sure that's just a typo and he meant to write the following but I'm not sure?
"Simply being able to pick from more instructions to execute in parallel is one thing, BUT we haven't seen an increase in the number of parallel execution ports since Conroe."
4. Under section Decoupled L3 Cache Anand writes
Ivy Bridge saw the addition of a small graphics L3 cache to mitigate this situation, but ultimately giving the on-die GPU independent access to the big, primary L3 cache without worrying about power concerns was a big issue for the design team.
Im not completely understanding this? I think it means the GPU received its own cache with Ivy Bridge, as Anand writes the addition of a small graphics L3 cache. Or does this mean a portion of the L3 was dedicated to the GPU? And then the next sentence confuses me even more. I think he is saying the Intel design team knew that the issue of giving the GPU frequency control of the CPU L3 or not was a big deal for Intel but they ultimately decided with Ivy to keep the CPU+uncore and GPU on separate frequency domains?
Also, now that Haswell has returned to the 3 clock domain design does is the small graphics L3 cache from Ivy still there?
I dont understand the 2nd to last sentence of this section.
There are now dedicated pipes for data and non-data accesses to the last level cache.
Finally the last sentence of this section.
Haswells memory controller is also improved, with better write throughput to DRAM. Intel has been quietly telling memory makers to push for even higher DDR3 frequencies in anticipation of Haswell.
I take this to mean that Intel knows that if there is a slight memory weakness with Haswell it comes from the increased latency of the L3. Which they hope can be mitigated by pushing memory manufacturers for faster main memory. As usual Anand puts quite a bit in between the lines but youve gotta really read to pull it out.
5. Just a comment. Gotta love Anands style. So great. Who else could equate writing well threaded code for independent tasks with the visualization of grabbing a low hanging apple off a tree!
Parallelizing truly independent tasks is the low hanging fruit, but its the tasks that all access the same data structure that can create a problems.
6. It seems as though Intel deliberately over engineers just a little bit, either the front end or back end of the instruction engine, and then catches up and then surpasses that end in the next tock or two. The widening of the back end of Haswell seems very significant to me. I'm thinking that we're going to finally see a wider than 4 unit front end with the next tock? Possible?
Anyway I have a few nit-picky questions I'm hoping the CPU gurus around here an help me with.
1. On page two "Platform Retargeting" Anand writes
"There will be four client focused categories of Haswell, and I can only talk about three of them now. There are the standard voltage desktop parts, the mobile parts and the ultra-mobile parts: Haswell, Haswell M and Haswell U."
This seems to indicate the three Haswells are Haswell, Haswell M, and Haswell U
Then a little farther down that page he writes
"It's the Haswell U/ULT parts that brings about the dramatic change. These will be a single chip solution, with part of the voltage regulation typically found on motherboards moved onto the chip's package instead. "
This seems to imply that Haswell U/ULT are both the 3rd Haswell because he said he couldn't discuss the 4th Haswell. Or perhaps he is telling us something?
But then on Page 4 "The Fourth Haswell" he writes
"Just before this year's IDF Intel claimed that Haswell ULT would start at 10W, down from 17W in Sandy/Ivy Bridge. Finally, at IDF Intel showed a demo of Haswell running the Unigen Heaven benchmark at under 8W:"
Since this section is titled the 4th Haswell are we to assume that the 4th Haswell is the ULT part that is sub 10W? That one that he couldn't discuss? Or perhaps he's leaving us to infer that there will be desktop parts, "Haswell," mobile parts (at higher clocks than current mobile CPUs at this TDP) in the standard 35W range "Haswell M," mobile parts for ultrabooks in the 17W range "Haswell U," and the 4th Haswell in the sub 10W range "Haswell ULT?"
I'm thinking that Anand is "saying" without "saying" that the 4th Haswell is an ultralow voltage part intended for tablets and perhaps even smaller devices. He is making that point not with his words but with the Intel IDF demo. This leads me to suspect there is still a surprise in store for us as to how low (and small) Haswell will go.
2. Where exactly does the "front end" of the execution engine end and the "back end" begin? Or more to the point, is the Decode Queue the front end or the back end?
3. "Haswell's Wide Execution Engine" page, Anand writes
"Simply being able to pick from more instructions to execute in parallel is one thing, we haven't seen an increase in the number of parallel execution ports since Conroe."
Anand is always deliberate in his writing and I'd like to know what he was getting at? I'm pretty sure that's just a typo and he meant to write the following but I'm not sure?
"Simply being able to pick from more instructions to execute in parallel is one thing, BUT we haven't seen an increase in the number of parallel execution ports since Conroe."
4. Under section Decoupled L3 Cache Anand writes
Ivy Bridge saw the addition of a small graphics L3 cache to mitigate this situation, but ultimately giving the on-die GPU independent access to the big, primary L3 cache without worrying about power concerns was a big issue for the design team.
Im not completely understanding this? I think it means the GPU received its own cache with Ivy Bridge, as Anand writes the addition of a small graphics L3 cache. Or does this mean a portion of the L3 was dedicated to the GPU? And then the next sentence confuses me even more. I think he is saying the Intel design team knew that the issue of giving the GPU frequency control of the CPU L3 or not was a big deal for Intel but they ultimately decided with Ivy to keep the CPU+uncore and GPU on separate frequency domains?
Also, now that Haswell has returned to the 3 clock domain design does is the small graphics L3 cache from Ivy still there?
I dont understand the 2nd to last sentence of this section.
There are now dedicated pipes for data and non-data accesses to the last level cache.
Finally the last sentence of this section.
Haswells memory controller is also improved, with better write throughput to DRAM. Intel has been quietly telling memory makers to push for even higher DDR3 frequencies in anticipation of Haswell.
I take this to mean that Intel knows that if there is a slight memory weakness with Haswell it comes from the increased latency of the L3. Which they hope can be mitigated by pushing memory manufacturers for faster main memory. As usual Anand puts quite a bit in between the lines but youve gotta really read to pull it out.
5. Just a comment. Gotta love Anands style. So great. Who else could equate writing well threaded code for independent tasks with the visualization of grabbing a low hanging apple off a tree!
Parallelizing truly independent tasks is the low hanging fruit, but its the tasks that all access the same data structure that can create a problems.
6. It seems as though Intel deliberately over engineers just a little bit, either the front end or back end of the instruction engine, and then catches up and then surpasses that end in the next tock or two. The widening of the back end of Haswell seems very significant to me. I'm thinking that we're going to finally see a wider than 4 unit front end with the next tock? Possible?
