Some Bulldozer and Bobcat articles have sprung up

Discussion in 'CPUs and Overclocking' started by Eeqmcsq, Aug 24, 2010.

  1. JFAMD

    JFAMD Senior member

    Joined:
    May 16, 2009
    Messages:
    565
    Likes Received:
    0
    This is where reading the small print and drawing conclusions vs. asking the question will get you in trouble.

    The comparison is NOT against previous generation. The comparison is two threads running on one module vs. 2 threads running on two complete dedicated cores.

    Very minor impact in overall performance per thread, huge impact in power savings and die level savings (cost).

    I have said it several times. IPC WILL BE HIGHER.
     
  2. JFAMD

    JFAMD Senior member

    Joined:
    May 16, 2009
    Messages:
    565
    Likes Received:
    0
    I can't speak to the different implementations. Just as we are doing 12 and 16 cores on Interlagos they could well be doing a 4 core Zambezi. The only data that I will comment on is the max cores, which is 8. I am not a client guy.
     
  3. jvroig

    jvroig Platinum Member

    Joined:
    Nov 4, 2009
    Messages:
    2,398
    Likes Received:
    0
    You say that as if only Intel had OoO, and that it is a minuscule contribution. The fact that Atom is reviled by all should tell you otherwise, along with the fact that AMD CPUs are also OoO.

    A lot here doesn't make sense:

    1) "thanx microsoft"? Are we blaming Microsoft now for CPU woes? I suppose it is also Microsoft's fault that StarCraft II reportedly only uses two cores?

    2.) AMD was aware that most cores go underutilized so they made an architecture that is optimized for multi-threaded scenarios instead of single-threaded? Don't you just think that is a bit of "We need X, therefore let's build Y instead of X"? Wouldn't AMD have optimized for single-threaded performance if they were aware that most cores go underutilized anyway because single-threaded performance was all that still mattered?

    3.) (off topic) You could try to double check your spelling, and use proper punctuation, and try to state your thoughts clearer. When you argue like you do, and don't bother to fix these small things, it makes responding to you harder, and makes arguments longer than necessary. I am not a native English speaker myself, English is only my second language, and I speak it probably 5-10% of a day only, but I do my best, anyway. Peace :)
     
  4. jvroig

    jvroig Platinum Member

    Joined:
    Nov 4, 2009
    Messages:
    2,398
    Likes Received:
    0
    Exactly what I said. Thank you for confirming, because until you did, all I had as basis was my own deductive reasoning, which means squat as a proof.

    Thank you for confirming this as well. Anand's wording was vague enough as to be debatable, and you stating it this clearly is very much appreciated.

    Great, I wasn't arguing against that. I merely remembered the slide, and your response to it, which then became my basis for saying Zambezi will come out as a quad-core and an octo-core. Never did (and never meant to) imply that Zambezi was "only" a quad-core.
     
  5. ModestGamer

    ModestGamer Banned

    Joined:
    Jun 30, 2010
    Messages:
    1,140
    Likes Received:
    0
    1. Microsoft is the problem with multithreading performance. The Os should be handling those tasks. The application should be oblivous to the core count. Look at the BEOS and HaikuOs websites.

    2. They are attacking both fronts. They are implementing SMT "HT" in hardware vrs in a quasi emulated core sense. It should be faster all thing considered.

    3. I am a horriable speller and I can't type at 10% of the speed I think. I could read what I wrote twice and not see the mising words. If you don't like my posts.

    Don't reply.
     
  6. extra

    extra Golden Member

    Joined:
    Dec 18, 1999
    Messages:
    1,941
    Likes Received:
    0
    Don't try to spin it some weird way. AMD has said, unequivocally, that bulldozer will have higher single thread performance than phenom II.
     
  7. khon

    khon Golden Member

    Joined:
    Jun 8, 2010
    Messages:
    1,240
    Likes Received:
    39
    So now we know what BD design looks like, information that would be available well before it taped out. But we know it has taped out, so when are we going to see information on what the performance is like ?

    The only thing I've seen so far was the claim of 50% increase in performance for 33% increase in cores. Which I guess would mean that an 8-core BD is roughly 50% faster than a 6-core thuban in a fully threaded workload. That's obviously a nice improvement, but it's not going to be enough to compete against LGA2011 SB models, so is AMD once again leaving the high end to Intel, and settling for competing with LGA1155 SB mainstream models ?

    Not that I'd mind getting an 8-core BD for about the same price as a 4C/8T SB, especially if they support more than 16 PCI-E lanes for graphics.
     
  8. Scali

    Scali Banned

    Joined:
    Dec 3, 2004
    Messages:
    2,495
    Likes Received:
    0
    Be careful there...
    You're saying "Because AMD has sucked for the past few years, they are going to be successful now".
    I think in general you'll find the opposite to be true. If a company has sucked for a number of years, they can't just bounce back. They worked for years on the Barcelona architecture aswell, and that was when they were NOT sucking (so they had more profits, and more to invest in R&D), and we all know how much they didn't drop the ball there.
    There's no 'secret sauce' as I said many times before.

    As I said, we've had CPUs with 3 or more ALUs and AGUs for years (Pentium III was the last dual ALU one pretty much), and now AMD is going back to 2 ALUs and AGUs.

    Complete nonsense.
    An OS can only run threads. The software developer has to optimize the code for threading. All modern OSes support threading. Nothing any HT or similar CPU does is 'the work of the OS'.
     
  9. ModestGamer

    ModestGamer Banned

    Joined:
    Jun 30, 2010
    Messages:
    1,140
    Likes Received:
    0

    your ignorance bleeds through your post.

    threading of a application in a OS should be handed by the OS. The applications need not know what hardware there is. That is the job of the OS. Go look at the BEOS and HaikuOS. They show how this should be done.

    Well I doubt AMD is going bust over bulldzoer but my gut tells me if they didn't think they had a winner. It would not have tapped out. The first iteration might be a bit shaky or not exactly what they hoped for but the refresh will most likely adress those isues.


    Ford sucked for year. Look at them today.
     
  10. jvroig

    jvroig Platinum Member

    Joined:
    Nov 4, 2009
    Messages:
    2,398
    Likes Received:
    0
    Applications are oblivious to the core count. They spawn threads (not request cores or anything), and the OS caters to their needs by scheduling any such threads time with CPU, and now that CPUs have multiple cores, they are scheduled to available cores as is ideal. I can spawn 16 threads on a program I create, and those 16 threads will be handled by the OS despite having only a quad core, or even a single core CPU. It's basic multi-tasking, and OSes have that down to a pat. But if I create a program that only ever uses one single thread, then my quad core will perform just as fast as if it were only a single core CPU.

    The problem is that applications don't request/spawn/need many threads at all when the processing needs are serial in nature. In fact, in such a scenario, they can only really use one. That is not Microsoft's fault.

    It will be impossible for the OS to "multi-thread" an application that does not work on anything more than a single thread. The OS will have no way to transform a serial workload into a parallel workload, especially a program it knows nothing about. At least, not by non-magic means, and if actually done without magic, that would be a major breakthrough in parallel / multi-threaded programming. I would certainly want in on that, because as it is now, I have to go the painstaking route of optimizing my programs to use multiple threads, and it is no easy task figuring out the best way to parallelize as much as possible from what used to be, or easily are, serial programs (if at all possible - sometimes the program is simply 90% serial, and parallelizing it is impossible or impractical given the costs (code complexity, which affects costs related to development, debugging and maintenance) versus the gain).

    I do not know where to start here, and calling Intel's HT implementation as "quasi emulated core" just makes me wonder more if you actually understand the topic (but just like calling it what it isn't), or you actually don't (hence you come up with nonsensical descriptions).

    Sorry, perhaps I should not have brought it up. I hope you are not mad.



    I give up on this topic. For one thing, all of what we are talking about now is actually off-topic. The real place for this is another thread (or, if #1, then that thread should be in Programming or OS subforums). So have your say if you please, then we'll let it go so as not to continue with the derailment.
     
  11. Idontcare

    Idontcare Elite Member

    Joined:
    Oct 10, 1999
    Messages:
    21,130
    Likes Received:
    6
    Look at the die size differences between Deneb and Propus and compare the power-consumption. To be sure cache is not zero power-consumption but is about the lowest power-consuming circuitry you can put on a cpu. About the only drawback is the required die-area (which translates to higher production cost).
     
  12. Scali

    Scali Banned

    Joined:
    Dec 3, 2004
    Messages:
    2,495
    Likes Received:
    0
    Yea I know... Barcelona will also be 40% faster than Kentsfield.
     
  13. Scali

    Scali Banned

    Joined:
    Dec 3, 2004
    Messages:
    2,495
    Likes Received:
    0
    Lol yea, right.

    And you call ME ignorant?
    I don't know what you do for a living, but I write multithreaded software for a living.
    I think I have a pretty good idea of where CPU, OS and applications fit into the picture.
     
  14. AtenRa

    AtenRa Lifer

    Joined:
    Feb 2, 2009
    Messages:
    12,236
    Likes Received:
    1,044
    With a 4-way Decoder it means it can issue 4 Instructions per Thread per Cycle vs 3-way Decoder in Phenom II (3 Instructions per Thread per Cycle). Correct me if im wrong

    Phenom II has 3 Execution Ports inside the INT execution Unit (3 ALUs + 3 AGUs/Ld/ST ) but Bulldozer has 4 Execution Ports inside the INT execution Unit ( 2 ALUs + 2 AGUs ) plus one Ld/ST unit. So BD can execute 4 Instructions per Thread per Cycle and Phenom II can execute 3 Instructions per Thread per Cycle, in other words, IPC will go up and not down.

    Feel free to correct me where im wrong
     
  15. StrangerGuy

    StrangerGuy Diamond Member

    Joined:
    May 9, 2004
    Messages:
    8,315
    Likes Received:
    48
    Hey Mr. know-it-all, ever since the K7 debut AMD always have increased their IPC till now, excluding the TLB bugged Phenoms. Oh wait, maybe you don't.
     
  16. Scali

    Scali Banned

    Joined:
    Dec 3, 2004
    Messages:
    2,495
    Likes Received:
    0
    The decoder is shared...
    So it can decode 4 instructions per 2 threads per cycle.

    In theory yes... But the question is whether a balance of 2+2 ALUs and AGUs is good.
    It can execute at most 2 ALU instructions per cycle.
    If most instructions are ALU rather than AGU, then you'd be better off with the 3 ALU ports of the Phenom II.

    That's the thing... the CPU cannot choose the instructions to execute. It will just have to execute whatever is being fed to it by the software.
     
  17. Scali

    Scali Banned

    Joined:
    Dec 3, 2004
    Messages:
    2,495
    Likes Received:
    0
    AMD has only added and improved execution units since K7... this is the first time they're going to REMOVE them... Might want to pay attention to the topic at hand before shouting your mouth off.
     
  18. Phynaz

    Phynaz Diamond Member

    Joined:
    Mar 13, 2006
    Messages:
    9,384
    Likes Received:
    261
    BD can retire 4 instructions per clock.
     
  19. zephyrprime

    zephyrprime Diamond Member

    Joined:
    Feb 18, 2001
    Messages:
    7,496
    Likes Received:
    1
    Applications are oblivious to core counts in Windows and every other major operating system as well. MS is NOT the problem with multithreading performance. Windows has supported multithreading since the first version of NT more than 17 years ago.
     
  20. heyheybooboo

    heyheybooboo Diamond Member

    Joined:
    Jun 29, 2007
    Messages:
    6,289
    Likes Received:
    0
    Ding! Ding! Ding!

    They have not dealt with NUMA so well, however, but are slowly coming around ...





    --
     
  21. IntelUser2000

    IntelUser2000 Elite Member

    Joined:
    Oct 14, 2003
    Messages:
    4,232
    Likes Received:
    73
    It's very naive to put the fault entirely on software. Development on software costs have increased exponentially, and for a lot of developers multi-thread development is probably not at the top of their list.

    With finite amount of resources, time, and manpower, focusing on optimization means taking off resources from what really matters on software, which is usability.

    Server is mostly throughput oriented, while on the PC its latency and response time dominant. There is of course overlap between the two, but the statement is true in general. I'd like to say its not all about PC, but the discussion about Windows means the main focus for lots of users in this thread is about PC.

    Imagine if Blizzard spent their resources on Starcraft II for multi-threading and graphics rather than things like gameplay. Sure, it might have been technically impressive, but I doubt it would have sold as much as it did.
     
  22. Cogman

    Cogman Lifer

    Joined:
    Sep 19, 2000
    Messages:
    10,117
    Likes Received:
    19
    I want to correct this, Application DEVELOPERS are oblivious to core counts. Windows and every other major OS that I know of provide a method for an application developer to find out how many cores are in a system. Application developers, however, choose not to use this.

    But I agree. MS, and just about any OS, are not to blame for crappy multithreaded applications. Honestly, their threading apis are dead simple to use. However, Application developers are scared of them. Race conditions, deadlocking, ect (which are fixed by proper coding practices) are what scare them.
     
  23. Cogman

    Cogman Lifer

    Joined:
    Sep 19, 2000
    Messages:
    10,117
    Likes Received:
    19
    Blizzard DID spend resources on multi-threading and graphics. Yes, it added to the overhead. But it also added to the quality of the application.

    Not every application needs to be threaded, nor does every application need an Opengl interface. However, where they do, it seems developers often take the lazy way out.
     
  24. cbn

    cbn Lifer

    Joined:
    Mar 27, 2009
    Messages:
    11,129
    Likes Received:
    89
    I sure hope AMD doesn't continue with the "Thuban Trick".

    However, could it be BD is compact and efficient enough to pull it off without costing too much to build?
     
    #124 cbn, Aug 25, 2010
    Last edited: Aug 25, 2010
  25. khon

    khon Golden Member

    Joined:
    Jun 8, 2010
    Messages:
    1,240
    Likes Received:
    39
    An 8-core BD could well be smaller than a 4-core SB, since adding the four secondary cores apparently only increases size by 5%, and it doesn't have the IGP that SB does.