Conroe and Athlon 64

stevty2889

Diamond Member
Dec 13, 2003
7,036
8
81
Very interesting article. I don't think the FSB is going to be as much as a limiter for Conroe as the seem to believe however. While netburst chips are very bandwidth hungry, the FSB doesn't seem to have nearly as much impact on pentium-m's, and conroe should be a lot more similar to a pentium-m than to a netburst chip.
 

Hard Ball

Senior member
Jul 3, 2005
594
0
0
Originally posted by: stevty2889
Very interesting article. I don't think the FSB is going to be as much as a limiter for Conroe as the seem to believe however. While netburst chips are very bandwidth hungry, the FSB doesn't seem to have nearly as much impact on pentium-m's, and conroe should be a lot more similar to a pentium-m than to a netburst chip.


How is PM not bandwidth hungry; In most application it does OK, clock for clock roughly equal a Turion 64; but for apps that require large memory bandwidth, it gets killed by K8, and also lags behind Netburst for that matter.
 

Quinton McLeod

Senior member
Jan 17, 2006
375
0
0
I agree with the article. Intel just has not provided enough information to convince me that the Conroe will kill AMD
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
A few nitpicks:

1. The author misintepreted 4-issue. Intel nomenclature defines "issue" as the pipeline width from the frontend to the backend, whereas others define that as scheduler to execution, which intel refers to as "dispatch". Given that definition, the claim that the FSB will affect 4-wide issue is bunk, since that is instruction fetch, barely affected by bandwidth or even latency issues.
2. I fail to see how AMD64 should be considered "next generation" compared to merom, especially when no justification is given. Of course AMD has technologies that merom lack, but the exact same can be said vice versa.
3. The author makes a comparison between merom's FSB frequency and AMD64's on-die memory controller frequency, which is kind of meaningless.
4. Large caches are not the only way to design around lower memory bandwidth. Hard/soft prefetchers do wonders on many workloads... sometimes wiping out the entire latency differential. Naturally that exposes glass jaws, but as long as the common case is fast, the processor will do ok.
5. If anything, 64-bit registers will shrink text size, since the compiler will generates less text for code which manipulates 64/128 bit values. As for larger datasets, is that really true? Since the registers and memory are still accessible at smaller granularities, programmers will not suck up more memory for the sake of it. Although I agree that programs are hogging more memory, in which case, a larger cache will help just as much as anything else.
6. The article makes no mention of FB-DIMM, which conroe/woodcrest will support.
7. The author asserts that the FSB will limit the efficiency of merom's wider issue width. True, but only if merom's buffer structures are narrow in depth... but they're not (sorry I cannot give numbers). Combine that with a high speed FSB, smarter prefetcher, software assist, etc, I believe merom's glass jaws due to memory fetch bandwidth will not be anywhere as severe as the article says.

I agree that CSI will be an equalizer, but the article definitely puts an overemphasis on the FSB, which only makes a performance difference at the high end. As mentioned above, there are many ways to design around the low fetch bandwidth.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
I agree with you on some points,

The current Intel approach (FSB + Mem controller on the NB) may hinder performance in two ways:

Latency. Prefething does help with this and I'm actually looking forward to hearing more about Intel's "memory disambiguation technology" and its performance impact. Cache helps with this too and having 4-8MB caches will give the prefetching technologies quite a bit of space to work with.

The memory controller's operating clock will hinder performance a bit compared to AMD because it will be limited to 333MHz commands. I doubt this will be much of a problem but since we see small latency and bandwidth differences between A64s at different clock speeds it may be enough to make a difference.

AMD's Hypertransport is the least remarkable part of AMD's single-socket systems. On multi-socket systems it helps out a lot because of the high transfer rate , low pin-count and point-to-point nature but with single socket it doesn't make much of a difference.

One thing I dont agree with you on is the FB-Dimm thing. FB dimms are higher-latency DDR2 dimms, so I woudlnt expect much of an improvement except on quad-channel systems, but since you get 2 DDR-2 channels per CPU on the AMD side I dont expect this to be too much of a performance win for Intel.

CSI will be an equalizer on the multi-processor systems because it should reduce the number of traces and pretty much simplify the design layout for SMP motherboards. Integrating mem controllers on Intel chips would pretty much turn the CSI links into what HT is right now on multi-cpu Opteron systems, mostly a link for acessing memory through another CPU's controller. The GPL+ FSB has actually scaled quite well so it shouldnt be much of a bottleneck at 1333MHz, especially considering that the chips will share L2.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
FB-DIMM can have up to six channels with 8 modules per channel. Even if the latency is higher, the overall bandwidth is greater. The number of available channels alleviates the FSB issue, especially with multicore, where HT's impact is much greater... so FB-DIMM definitely would help in that market.

I do not expect merom to blow past AMD's offerings... I'm guessing they will be neck to neck at the end of this year, thereabouts.
 

Shenkoa

Golden Member
Jul 27, 2004
1,707
0
0
I can see where your comming from. I am a fan of AMD but that article is very biased, it seems that it just bashes Intel and praises AMD.
 

phaxmohdem

Golden Member
Aug 18, 2004
1,839
0
0
www.avxmedia.com
I dunno, I'm pretty excited to see what conroe can do. The Core Duo chip (if any indication of desktop performance of Conroe) appears to be quite the nice piece, and Intel has been using the FSB architecture for so long, I think they know how to optimize it fairly well for their purposes. I simply don't see it being a huge bottle neck for a single socket system. In the server arena, 4-8 way I still think that there will be some deficiencies. Plus AMD has been sitting on the K8 architecture for so long now, you've got to start thinking they've got something waiting in the wings (besides quad core CPU's... I belive which will be dubbed K10??)

Its just a shame the CPU wars aren't as fierce as the ATI vs. Nvidia battle.
 

liebremx

Member
Apr 6, 2005
35
0
0
The address and command busses are still single pumped, so they essentially run at 266 MHz.

Well, I don't know which MCH the author was talking about but even the humble 845PE has the address bus DOUBLE pumped.
 

mamisano

Platinum Member
Mar 12, 2000
2,045
0
76
Sure, CSI will be the equalizer... they are designing it to beat current HyperTransport specs, but by the time it comes out (and with stripped out features) HT 3.0 will on the scene to maintain dominance.

I like AMDs tactics, keep their cards close to the vest while Intel shows the world its hand... Only reason is because Intel is in panic mode.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
IMO the FSB really is as big of a problem as he mentions. It makes memmory latentcy slow, and allows for lower bandwidth. Like is said i n te article, everything the CPU does has to come from memmory, so the quick you can get stuff, and the more stuff you can get the better. What is the point of supporting FBDIMMS and DDR3 memmory when you are still using a FSB that cannot even support the bandwidth of current memmory technologies. Also, we you add mroe processors you just aggrivate the problem becasue now you have to put that processor to processor communication on the same bus with the memmory. Whats the point of having a really fast processor if you cannot feed it enough information to keep it busy?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
How do you know CSI is designed to meet current HT specs, and why would it be stripped down when it does come out? TheInq has good news, but their speculation is often total garbage.

How did Intel show its hand? Merom is a single generation out and less than a year from mass production. Do you know how many future cores are already in the works?

Whats the point of having a really fast processor if you cannot feed it enough information to keep it busy?

As far as I know, intel has not released perfmon numbers to the public, so how can people conclude the FSB is *the* reason the P4 is losing right now? I can name a half-dozen architectural reasons and those shift and disappear based on a whole bunch of external factors. Hell, I have one reason that pretty much tops the list every time (over the FSB for sure), unless you use a P4 tuned compiler, heh.

In any case, I already addressed reasons why the FSB isn't that big of a deal with regular platforms and design methods to nullify its impact. Perhaps people take a step back now and look at the bigger picture...
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Intel has announced alot of names and stuff, but they don't really tell you what they all mean, conroe is 6 months away and we still don't know alot about its architecture.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
We know a lot about its architecture, just not enough to estimate performance. The FSB in itself is not to blame for the higher memory latency. Intel could conceivably make a chip that uses CSI but does not include the memory controller on-chip and the latency would be remain pretty much the same. CSI would make a difference on a multi-socket system because it would allow Intel to have an independent channel to each chip. I'm sure a 4 socket system with independent data transports to each chip and eight FB channels would be a force to be reckoned with.

Now, let's talk about FB again... having 6 FB channels is not very useful considering that the FSB will bottleneck the ram unless you have more than 2 frontside buses (I dont even want to think about having 4 buses coming out of the northbridge). That's why I said quad-channel would be useful, any bandwidth beyond 21.3GB/sec would be wasted on dual FSB1333... kind of like throwing 2 DDR2 533 channels on a FSB533 Dothan.

 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: mamisano
Sure, CSI will be the equalizer... they are designing it to beat current HyperTransport specs, but by the time it comes out (and with stripped out features) HT 3.0 will on the scene to maintain dominance.

I like AMDs tactics, keep their cards close to the vest while Intel shows the world its hand... Only reason is because Intel is in panic mode.

Firstly, they actually have to get it to work...no small thing to do! My understanding of CSI (and I really don't know) is that it's a ring structure rather than a simple P2P...this is a very cool idea, but also very complex and fraught with perils.
HT will be at least 1400 this year on standard desktops, but keep an eye out for what Socket F will have to offer...I have a feeling that AMD is saving it's best tech for their Socket F server release.
 

liebremx

Member
Apr 6, 2005
35
0
0
Originally posted by: Furen
...
Now, let's talk about FB again... having 6 FB channels is not very useful considering that the FSB will bottleneck the ram unless you have more than 2 frontside buses (I dont even want to think about having 4 buses coming out of the northbridge). That's why I said quad-channel would be useful, any bandwidth beyond 21.3GB/sec would be wasted on dual FSB1333... kind of like throwing 2 DDR2 533 channels on a FSB533 Dothan.

Got DMA?
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
I think conroe is using a very similar architecture compare to amd 64. short pipeline, more work done per clock, low mhz, low power consumption. This is testament to how successful a64 is. They in my opinion basically copied amd 64 by reverting back to classic Petium 3 based designs: pentium-m etc. With that design the RAM bandwidth is probably none factor but considering you got multiple cores might still be. My guess is conroe will perform similar to AMD 64 @90nm, but will go little higher @65nm due to Mhz. However the layout of the core will influence power consumption OC ability etc. So hace to see the final product to make any conclusions.
 

MrSpadge

Member
Sep 29, 2003
100
6
0
"Now, it is not technically true that AMD is a 3 issue core, as it is actually a 9 issue core. AMD has broken up the functionality of the core into three areas: integer, floating point, and SIMD. Each of these three separate units are three issue, but for now we will just address the single 3 issue integer unit."

He's my hero. Actually it's 3 ALUs ("integer"), 3 FPUs and 3 AGUs (adress generation units), while SIMD is done by the (extended) FPU as well. This can easily be seen on any of the general architecture diagrams which they show when some fundamentally changed Athlon appears.
If this guy gets it so badly wrong in the beginning, I'll surely trust his judgement (not backed up by any numbers / stats / benches) in a topic as complicated as processor design.

MrS
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
The number of instructions that a CPU issues is really not that important to me, everyone knows that even the current CPUs almost never issue 3 instruction per clockcycle. Widening the core just means adding even more functional units that will rearely be used. The fact of the matter is that current code jsut doesn't have enough parellelism in it to allow you to use all these units. IF your gonna have a really wide core then you also need SMT to keep it busy, but imo after Intels Hyperthreading failed it is unlikely that Intel will introduce any new SMT processors.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
An extra 2 DDR2 channels just for DMA? Seems a bit overdone to me but I guess it could have its uses.

Yeah, you cant say that a CPU is 9-issue (or even 6-issue) just because it has 9 execution units. If the CPU cannot ISSUE (as in decode and send to the execution units) more than 3 instructions per clock then it doesnt make much of a difference to have 25 million execution units. Having more execution units helps out with instructions that need more than one clock to complete, though, since there are less chances of contention and, depending on the actual unit arrangement, may also mean that there are more units that can do the most common operations. One of the main weaknesses of the original P6 was that it only had two FP units, an FADD and an FMUL (and doing an fmul also required the use of the FADD unit).
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Note that comparing the number of execution units is not always the right thing to do. For example, the K8 and P4 had segmented scehdulers, whereas P6/PM have universal schedulers. From what I know, P6 fu's are occupied more often than K8/P4.

One can hope merom's wider frontend can keep the flow going.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
The P6s FU are more occupied but it only has one and a half (the FMUL unit is not fully-pipelined, after all). I'm not saying that the K7's FUs are more efficient, it isn't, but efficiency is only one of the many factors that affect final performance. The FUs on the K8 may spend more time idling than the P6s but they accomplish more work in the end, too, because there are more of them. You could say that the K7's FPU arrangement is the brute-force approach to the problem but sometimes brute force is enough to make a difference (Kind of like Northwood destroyed AMD's more efficient K7 with brute clock speed).
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Yeah, this is a huge balancing act and people getting paid more than I do run lots of simulations to figure out how to tweak the knobs.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,418
15,535
136
I really think all this supposition is a waste. Wait until it comes out, and then discuss. I guess some people have nothing better to do than BS on things that are in the future....