Some Bulldozer and Bobcat articles have sprung up

cbn · Aug 25, 2010

JFAMD said:
Very minor impact in overall performance per thread, huge impact in power savings and die level savings (cost).

That is a nice way of putting things. (makes sense to me).

cbn · Aug 25, 2010

khon said:
An 8-core BD could well be smaller than a 4-core SB, since adding the four secondary cores apparently only increases size by 5%, and it doesn't have the IGP that SB does.

Good point.

Maybe the small die size also leaves more room for running the fabs "hot".

But if such a situation were ever to arise what would the fallout be? Maybe we will see some quad core chips made from the quad module designs?

cbn · Aug 25, 2010

Speaking of harvested chips,

Does anyone have ideas or speculation on how the "shared components" in each BD module would affect this? (Both positively and adversely)

Idontcare · Aug 25, 2010

Computer Bottleneck said:
Speaking of harvested chips,

Does anyone have ideas or speculation on how the "shared components" in each BD module would affect this? (Both positively and adversely)

Presumably they'd cut their losses at the module level, losing two cores in the process. Consider that there is only 12.5% die-area within the core which is truly redundant from a core vs core segmentation. If the fault in the core's logic lies in the other 87.5% of the core's die-area then the module is dead anyways.

cbn · Aug 25, 2010

Idontcare said:
Presumably they'd cut their losses at the module level, losing two cores in the process. Consider that there is only 12.5% die-area within the core which is truly redundant from a core vs core segmentation. If the fault in the core's logic lies in the other 87.5% of the core's die-area then the module is dead anyways.

Thanks, that makes the situation very clear.

Yeah, hitting exactly one core out of two across four modules would require incredible luck. I was thinking maybe not all the fetch and decode was necessary to run only one core in each module (but even if that were possible I'll bet the odds would still be overwhelmingly bad come to think of it).

ModestGamer · Aug 25, 2010

jvroig said:
Applications are oblivious to the core count. They spawn threads (not request cores or anything), and the OS caters to their needs by scheduling any such threads time with CPU, and now that CPUs have multiple cores, they are scheduled to available cores as is ideal. I can spawn 16 threads on a program I create, and those 16 threads will be handled by the OS despite having only a quad core, or even a single core CPU. It's basic multi-tasking, and OSes have that down to a pat. But if I create a program that only ever uses one single thread, then my quad core will perform just as fast as if it were only a single core CPU..

your still missing the point. 90% of the performance issues as far as execution and user interaction as well as CPU utilization are essentially based on windows being a pretty shitty OS. The CPU manufacturers are simply trying to work around this flaw.

Saw it slow say it fast. Windows drags CPU performance down.

You illustrate my point right here

But if I create a program that only ever uses one single thread, then my quad core will perform just as fast as if it were only a single core CPU.

Why does the application even care ? It should simply make commands of the Os API to execute instructions. Thread generation should only occur in the OS.

THIS IS THE FATAL FLAW WITH WINDOWS

The problem is that applications don't request/spawn/need many threads at all when the processing needs are serial in nature. In fact, in such a scenario, they can only really use one. That is not Microsoft's fault.

Click to expand...

It absolutely is. That why we are being faced with ever more exoctic sulutions to problems that should not even exist. AMD by the looks of things is essentially dealing with a windows centric issue. Application dependance. Why is the OS not doing its job. Before you say another word. go look at apple Operating systems.

It will be impossible for the OS to "multi-thread" an application that does not work on anything more than a single thread. The OS will have no way to transform a serial workload into a parallel workload, especially a program it knows nothing about. At least, not by non-magic means, and if actually done without magic, that would be a major breakthrough in parallel / multi-threaded programming. I would certainly want in on that, because as it is now, I have to go the painstaking route of optimizing my programs to use multiple threads, and it is no easy task figuring out the best way to parallelize as much as possible from what used to be, or easily are, serial programs (if at all possible - sometimes the program is simply 90% serial, and parallelizing it is impossible or impractical given the costs (code complexity, which affects costs related to development, debugging and maintenance) versus the gain).

Wrong wrong wrong wrong wrong.

these guys do it with no problem.

http://www.google.com/url?sa=t&sour...fdta0G&usg=AFQjCNGZwUNnhrO3EI1y_LRQXT7XuoI55w
http://www.haiku-os.org/

go tell them they can't do what they already have a OS doing.

I'll have a good laugh. they also have a very aggresive threading engine in the OS. actually if they put more eye candy on it and get a slightly more modern looking GUI and they manage to get a functional version on modern hardware.

I do not know where to start here, and calling Intel's HT implementation as "quasi emulated core" just makes me wonder more if you actually understand the topic (but just like calling it what it isn't), or you actually don't (hence you come up with nonsensical descriptions).

Click to expand...

But thats exactly what HT is. It is a quasi core. essentially a fancy Out of Order execution that divides up the resources of the core. It is exaclt ywhat it is. Hardware emulation of a additional core.

Sorry, perhaps I should not have brought it up. I hope you are not mad.

you should have seen my english teachers. I am long over it.

I give up on this topic. For one thing, all of what we are talking about now is actually off-topic. The real place for this is another thread (or, if #1, then that thread should be in Programming or OS subforums). So have your say if you please, then we'll let it go so as not to continue with the derailment

actually Os use of resources is a huge portion of the sucess of a CPU. bad instructions and code generate shit performance.

Cogman · Aug 25, 2010

ModestGamer said:
your still missing the point. 90% of the performance issues as far as execution and user interaction as well as CPU utilization are essentially based on windows being a pretty shitty OS. The CPU manufacturers are simply trying to work around this flaw.

Umm. You are an idiot. A large portion of our performance issues arise from the fact that x86 has become a pretty shitty standard due to the legacy hardware it HAS to support. This has NOTHING to do with the operating system of choice.

Saw it slow say it fast. Windows drags CPU performance down.

You illustrate my point right here

No, he doesn't. What he illustrates is that you clearly have no clue about what the OS is in charge of and what the application is in charge of. The OS should NEVER try to change how an application behaves, EVER. And no OS on the market does this. Taking a single threaded application and trying to make it multithreaded is bad on so many levels NO operating system for ANY system tries to do this.

Why does the application even care ? It should simply make commands of the Os API to execute instructions. Thread generation should only occur in the OS.

No, Thread generation should NOT only occur in the os (Ok, the OS should be in charge of creating and managing WHEN threads run, but not WHAT they run). Figuring out what can and can't be locked is very much a consideration that application developers need to make, not OS writers. For an OS writer to make such a decision would mean that before each application starts to run, the OS would have to comb through the application, see what is running when, see if it could be split up, and see if there would be any race conditions. That is a HUGE problem to solve that would really get you complaining about application launch speed if any OS ever even dared to attempt it.

THIS IS THE FATAL FLAW WITH WINDOWS

It absolutely is. That why we are being faced with ever more exoctic sulutions to problems that should not even exist. AMD by the looks of things is essentially dealing with a windows centric issue. Application dependance. Why is the OS not doing its job. Before you say another word. go look at apple Operating systems.

You are seriously a moron. When it comes to threading APPLE DOES THINGS JUST LIKE WINDOWS

Take a look before you start spouting stupid crap
http://developer.apple.com/mac/libr...al/KernelProgramming/scheduler/scheduler.html

They use the Pthreads interface (Unix standard thread creation stuff. VERY comparable to windows threads) to allow applications to create threads. And they have used that interface for a LONG time, even before they were using the x86 architecture.

Wrong wrong wrong wrong wrong.

these guys do it with no problem.

http://www.google.com/url?sa=t&sour...fdta0G&usg=AFQjCNGZwUNnhrO3EI1y_LRQXT7XuoI55w
http://www.haiku-os.org/

go tell them they can't do what they already have a OS doing.

I'll have a good laugh. they also have a very aggresive threading engine in the OS. actually if they put more eye candy on it and get a slightly more modern looking GUI and they manage to get a functional version on modern hardware.

Again, YOU ARE AN IDIOT
http://www.haiku-os.org/legacy-docs/bebook/TheKernelKit_ThreadsAndTeams_Overview.html
This is how you interact with threads in Haiku, and Guess what, it is VERY SIMILAR to the way you work with windows threads, and pthreads. I have no clue where you are pulling this "Haiku automatically multithreads applications" crap, It doesn't. Haiku is a heavily multithreaded operating system. That does not mean that every application that runs on it will be multithreaded. Just like every other OS on the face of the plant, they still have to specifically request that the OS create a thread and tell the thread what to do.

Seriously, have you ever even DONE application development? I have, lots of it, and on multiple platforms and systems. I know what I'm talking about when it comes to parallel programming and what systems can and can't do. You are living in some sort of lala land with an irrational hatred for windows. Grow up and do some actual application development before spouting off this malarkey about "Du Hurrr, Windows for the sucks cause It doesn't make me toast in the morning"

ModestGamer · Aug 25, 2010

Cogman said:
Umm. You are an idiot.

"

I have no idea what kind of code you write. but I work with low level stuff. You know binary, assembler. I can assure you that

1. the way the threading engine works is vastly different
2. windows sucks at it
3.the apple implementation is vastly different
4. It is not a x86 issue.

CPU's are vastly underutilitzed. Period. Especially where system responsiveness is concerned.

Vesku · Aug 25, 2010

I think Cogman was a little too emotionally invested in his rebuttal. However, he has some solid arguments. Just read up on what is being talked about openly by Intel and AMD. They are saying they are running out of ways to optimize code execution on their cores and asking compiler writers and programmers to pick up the pace in brainstorming multithreaded techniques. So while Windows threading might not be ideal, it is not the only limiter and most likely not the biggest roadblock in utilizing the multicore beasts coming down the pipe.

Cogman · Aug 25, 2010

ModestGamer said:
I have no idea what kind of code you write. but I work with low level stuff. You know binary, assembler. I can assure you that

1. the way the threading engine works is vastly different
2. windows sucks at it
3.the apple implementation is vastly different
4. It is not a x86 issue.

CPU's are vastly underutilitzed. Period. Especially where system responsiveness is concerned.

Bullshit.
Nobody, NOBODY works with binary. That just proves what an idiot you are. To even suggest such a thing is laughable in the extreme.

I've written code in everything from Assembly (primarily x86, Intel syntax, though I have dealt with the god-awful AT&T syntax) to C#.

I've already posted links to the APIs to prove my point. I've even gone so far as to install Haiku in a VM to write a program that was, Surprise, not magically threaded by the os. A simple while(true) loop disproves any claim you have to haiku being able to automagically thread applications.

That CPUs are underutilized, I agree with that, that it is the OSes fault, I heartily disagree with. An OS that is using loads of CPU resources is doing things wrong. OSes should be practically invisible. It is the Applications duty to take full advantage of whatever hardware is presented to it.

Tsavo · Aug 25, 2010

Wow, this thread went to piss and vinegar fast!

Be careful, or someone's going to get a ripped pocket protector and broken slide rules!

Tsavo · Aug 25, 2010

cogman said:
bullshit.
Nobody, nobody works with binary.

01001001001000000111011101101111011100100110101100100000011101110110100101110100011010000010000001100010011010010110111001100001011100100111100100100001

Cogman · Aug 25, 2010

Vesku said:
I think Cogman was a little too emotionally invested in his rebuttal. However, he has some solid arguments. Just read up on what is being talked about openly by Intel and AMD. They are saying they are running out of ways to optimize code execution on their cores and asking compiler writers and programmers to pick up the pace in brainstorming multithreaded techniques. So while Windows threading might not be ideal, it is not the only limiter and most likely not the biggest roadblock in utilizing the multicore beasts coming down the pipe.

Don't get me wrong, windows threading isn't perfect. It can be very expensive to create a new thread with windows (though, in any OS that I've come by this is true). As an application developer, If you are creating a thread to speed things up, you have to be SURE that the cost of creating the thread will be minor to the cost of the function that it will be running. There are ways to mitigate that overhead (threadpools), but all in all, it is still expensive (especially if you busy wait your threadpools).

What is currently going on in the software world, is threading has been pushed to be easier to implement. It is still the same old stuff happening, but languages are making it so the programmer sees less of that same old stuff. That is the way it should be.

On a side note, Intel once had a project, Mitosis I believe, that was supposed to attempt to multithread single threaded applications to achieve better hyperthreading performance (this was at the hardware level) That project died. The fact is, it is impossible to accomplish. You would essentially have to create a code decompiler (not just disassembler) and look for the expensive loops. Even then, to actually know a loop is expensive would require profiling. And even then, to find out if a loop is expensive or just halting would require you to solve the insolvable halting problem.

Cogman · Aug 25, 2010

tsavo said:
01001001001000000111011101101111011100100110101100100000011101110110100101110100011010000010000001100010011010010110111001100001011100100111100100100001

01011001011011110111010100100000011001000110111101101110001001110111010000100000011100000111001001101111011001110111001001100001011011010010000001101001011011100010000001100010011010010110111001100001011100100111100100101110

Cogman · Aug 25, 2010

Tsavo said:
Wow, this thread went to piss and vinegar fast!

Be careful, or someone's going to get a ripped pocket protector and broken slide rules!

Sorry, but nothing gets me riled up more then seeing off the wall claims and unwarranted defamation all while the person is acting like they are an expert in the field.

Computers, as you could probably tell, are my passion.

maniac5999 · Aug 25, 2010

Cogman said:
Computers, as you could probably tell, are my passion.

Well, While we're off topic from the AMD BOBCAT/BULLDOZER discussion, I might as well add in my own $0.02

Really? given your profile pic I'd imagine that a much simpler mechanical device (dating to the late 19th century) was your passion, and that you also had a penchant for trendy devices that failed to provide a concrete advantage was also an interest of yours. (admittedly I like vintage pointless trendy crap too, in addition to several of those, I have biometric chainring lying around somewhere)

I know exactly what Cogman's Profile Pic is, does anybody else?

Or we might try to go back on topic. ;-)

Cogman · Aug 25, 2010

maniac5999 said:
Well, While we're off topic from the AMD BOBCAT/BULLDOZER discussion, I might as well add in my own $0.02

Really? given your profile pic I'd imagine that a much simpler mechanical device (dating to the late 19th century) was your passion, and that you also had a penchant for trendy devices that failed to provide a concrete advantage was also an interest of yours. (admittedly I like vintage pointless trendy crap too, in addition to several of those, I have biometric chainring lying around somewhere)

I know exactly what Cogman's Profile Pic is, does anybody else?

Or we might try to go back on topic. ;-)

Lol. Truth be told, my nickname (and hence my avatar) came from programming. There was an old game called "Jedi knight: Dark Forces II" that had a scripting language which was stored "cog" files. I enjoyed editing those scripts and joined a script editing clan. When they asked me for a nickname, I went with EAH_Cogman. Cogman just sort of stuck with me since then... Back on topic

cbn · Aug 25, 2010

Deleted, for the purposes of staying on topic.

bryanW1995 · Aug 26, 2010

Scali said:
Because of the shared resources in a module (eg decoder, FPU), I'm not sure if you can speak of 'physical cores' with Bulldozer, to be honest.
I think we can say this:
A Bulldozer module is similar to one physical core on a HT processor: It contains two logical cores.
Logical cores on Bulldozer and HT processors can be considered equivalent.

But I'm not sure what a 'physical core' would be for Bulldozer. I think perhaps we should not even try to define it, as it isn't very relevant.

But yes, I think AMD will be marketing it on their logical core count.

the only similarity btwn ht and a BD module is that they each take up ~ 5-10% more die space. Obviously the proof will be in the pudding, but up to 80% extra performance on the 2nd core is much better than ht's 15%.

Cerb · Aug 26, 2010

AtenRa said:
With OoO (Out of Order) execution you dont need to execute instructions one after the other in a given order.

Yes, you do. ((A*B)+C)/D cannot be made parrallel, and must run in order. You must do the mul (A,B), then add (result,D), then div (result,D), waiting on the pipelines to go all the way through, each time. OOO helps when you have instructions that are not so dependent. Luckily, that's quite common.

I was assuming the 256, then 128 FP, are being executed as such because there was no other way.

If you're just talking about the threads each having 128-bit, or 256-bit, then what's to stop the module from running them as 128+128, then 256, if it can feed them, and there aren't any worrisome data dependencies?

In the end, when applications dependent on FP get benchmarked on BD, against Intel's comparable CPUs of tomorrow, we'll know. Until then, such details are going to be wild speculation, with too many variables.

TuxDave · Aug 26, 2010

Cerb said:
Yes, you do. ((A*B)+C)/D cannot be made parrallel, and must run in order. You must do the mul (A,B), then add (result,D), then div (result,D), waiting on the pipelines to go all the way through, each time. OOO helps when you have instructions that are not so dependent. Luckily, that's quite common.

While you're waiting for mul(A,B), why can't you schedule the next non-dependent uop, say like MUL(X,Y) and then do ADD (result, D) all back to back.

cbn · Aug 26, 2010

bryanW1995 said:
the only similarity btwn ht and a BD module is that they each take up ~ 5-10% more die space. Obviously the proof will be in the pudding, but up to 80% extra performance on the 2nd core is much better than ht's 15%.

I agree that does sound like a efficient combination. Just wondering if you or anyone else could shed some light on the following question I had:

When AMD added the second core to the Bulldozer module, did this also require extra fetch and decode logic?

Or do "fetch" and "decode" requirements stay relatively the same no matter how many additional integer cores are added?

bryanW1995 · Aug 26, 2010

Scali said:
Careful there...
"AMD is also careful to mention that the integer throughput of one of these integer cores is greater than that of the Phenom II's integer units."

Problem is, each Phenom II core has 3 integer units (or well 3+3, if you break it down to ALU/AGU).
Making the statement a bit of a 'no shit, Sherlock'-one (two units better than one? really?)

way to go mr detective, you have caught them saying that a bd integer core is at least 1/3 as powerful as a phenom II core. You're reading too much into this. They're saying that ipc is going to be better, it's foolish to think otherwise.

Scali said:
Yea I know... Barcelona will also be 40% faster than Kentsfield.

so if they don't explicitly state it then it must be something horrible, but if they DO explicitly state it then they're lying. kind of convenient, no?

Cerb · Aug 26, 2010

TuxDave said:
While you're waiting for mul(A,B), why can't you schedule the next non-dependent uop, say like MUL(X,Y) and then do ADD (result, D) all back to back.

Cerb said:
OOO helps when you have instructions that are not so dependent. Luckily, that's quite common.

You can, provided that is there, and needing execution.

TuxDave · Aug 26, 2010

Cerb said:
You can, provided that is there, and needing execution.

And most performance demanding applications are demanding because they have plenty of uops to execute.

Some Bulldozer and Bobcat articles have sprung up

Lifer

Lifer

Lifer

Elite Member

Lifer

Banned

Lifer

Banned

Diamond Member

Lifer

Platinum Member

Platinum Member

Lifer

Lifer

Lifer

Senior member

Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Lifer

Elite Member

Lifer