Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

jones377 · Jun 12, 2011

Nemesis 1 said:
better read the AVX2 PDF . There won't be an x86 haswell and this is why AMD cann't use the prefix of Vex its for a new intel microarch. That won't be carring the hefty x86 decode . Which means its not x86. It likely a VLIW cpu (EPIC) ITANIC

lol...

Nemesis 1 · Jun 12, 2011

Abwx said:
It means that Intel is pursuing its usual strategy to create new instructions
that will allow them to have a lead in computing performances.

The harsh reaction of intel when AMD proposed SSE5 is just a good
clue that should the two firms be on the same level in this matter,
intel would no more retain its lead more than a few years.

I called it right here in this thread . Intel stole AMD sse5 and FMA3 . NOT ! AMD does not have the vexprefix and never will have this is so cool .

Nemesis 1 · Jun 12, 2011

Abwx said:
All this sound as the differenciation is made by the compilers
to not enable some optimisations based solely on the "genuine intel"
checking procedure.
Either you didnt understand intel s optimisation guide, either
you are gullible enough to believe that such disoptimisations
will be implemented.

Anyway, your reasonning has more to do with a shareholder
that did buy intel s stock at its peak in 2001 , at about 70$,
and now is in search of some myths to enhance his hope
to regain his price if ever intel s crush the concurrence
and then can quietly milk the market.

Actually its YOU who doesn''t comprehend what the prefix of vex is.

But your also correct . Intel made sure AMD wouldn't follow Intel so AMD wouldn't be strandid with code thats basicly useless to them .

Compaction of SIMD prefix: Legacy SSE instructions effectively use SIMD
prefixes (66H, F2H, F3H) as an opcode extension field. VEX prefix encoding
allows the functional capability of such legacy SSE instructions (operating on
XMM registers, bits 255:128 of corresponding YMM unmodified) to be encoded
using the VEX.pp field without the presence of any SIMD prefix. The VEX-encoded
128-bit instruction will zero-out bits 255:128 of the destination register. VEXencoded
instruction may have 128 bit vector length or 256 bits length.

to be encoded
using the VEX.pp field without the presence of any SIMD prefix. Intel pp field is 0/1 AMDs is 0/0 which would cause a up . and AVX wouldn't function on the AMD processor because the OS cupid wouldn't allow it.

Nemesis 1 · Jun 12, 2011

I haven't read the chapter on emulation yet going to start that now. On AVX2

Nemesis 1 · Jun 12, 2011

If you guys thought BD was hyped this AVx2 is going to set the forums a blaze . The monkey on intels back gone in 2 short years. YES TES YES. With the information in this PDF and the other pdf . Many people will come to the same conclusion no more intel X86 decoders.

However, there was one company which took a more radical approach and while its processor wasn’t exactly blazing fast it was faster than those using the stripped back approach, what’s more it didn’t include the x86 instruction decoder. That company was Transmeta and its line of processors weren’t x86 at all, they were VLIW (Very Long Instruction Word) processors which used "code morphing" software to translate the x86 instructions into their own VLIW instruction set.

Transmeta, however, made mistakes. During execution, its code morphing software would have to keep jumping in to translate the x86 instructions into their VLIW instruction set. The translation code had to be loaded into the CPU from memory and this took up considerable processor time lowering the CPU’s potential performance. It could have solved this with additional cache or even a second core but keeping costs down was evidently more important. The important thing is Transmeta proved it could be done, the technique just needs perfecting.

Intel on the other hand can and do build multicore processors and have no hesitation in throwing on huge dollops of cache. The Itanium line, also VLIW, includes processors with a whopping 9MB of cache. Intel can solve the performance problems Transmeta had because this new processor is designed to have multiple cores and while it may not have 9MB it certainly will have several megabytes of cache.

Most interestingly though is the E2K compiler technology which allows it to run X86 software. This is exactly the sort of technology Intel need and since last year they have had access to it and employ many of it’s designers.

You can of course expect all these cores to support 64 bit processing and SSE3, you can also expect there to be lots of them. Intel’s current Dothan cores are already tiny but VLIW cores without out of order execution or the large, complex, x86 decoders leave a very small, very low power core. Intel will be able to make processors stuffed to the gills with cores like this.

Intel will now be free to do as it pleases with X86 decoding done in software Intel can change the hardware at will. If the processor is weak in a specific area the next generation can be modified without worrying about backwards compatibility. Apart from the speedup nobody will notice the difference. It could even use different types of cores on the same chip for different types of problems.

The New Architecture
To reduce power you need to reduce the number of transistors, especially ones which don’t provide a large performance boost. Switching to VLIW means they can immediately cut out the hefty X86 decoders.

Out of order hardware will go with it as they are huge, consumes masses of power and in VLIW designs are completely unnecessary. The branch predictors may also go on a diet or even get removed completely as the Elbrus compiler can handle even complex branches.
__________________

podspi · Jun 12, 2011

Yea! I can't wait to get Itanium level performance out of x86 CPUs like Intel's own Xeon!

Oh wait:

Kirk Skaugen said:
"Xeon's reliability and performance is now equal [to]—and in some cases better than—Itanium, and they're going to leapfrog [each other] in performance over time."

I'm sure these extensions will be useful and increase performance. I doubt they are going to remove the need for OoO and branch predictors. Itanium honestly isn't that successful except in heavy iron niches, and Xeon, an OoO x86 CPU, now threatens to make that niche even smaller.

I'm going to be blunt: Your ASSERTION that AMD will be unable to use the new instruction extensions IS NOT SUPPORTED by any of the documents you have linked. You can copy and paste excerpts from them all day long, and you can talk about the technology all you'd like (though I wish it wouldn't be in this thread, since it is OT and I liked the topic), but nothing you have said so far has been convincing in the slightest that AMD will be unable to use any new instruction-sets.

JFAMD · Jun 12, 2011

1. Don't harass nemesis on english. I spend half of my life in countries that don't speak english, it's a big world, everyone needs to deal with it.

2. Don't get all wrapped around instructions, what people are calling them and how they are documented. The days of instruction lockout are over. You are building a mountain out of a molehill and it is not even proven, at this point, that anyone, outside of the HPC world, is even going to use AVX.

And don't forget, the most common AVX will be AVX-128. We can do 2X those instructions because intel pads the top 128-bit of the AVX register.

Once we have BD out, and SB EP (probably late Q1 2012) we can continue this conversation.

Arachnotronic · Jun 12, 2011

JFAMD said:
1. Don't harass nemesis on english. I spend half of my life in countries that don't speak english, it's a big world, everyone needs to deal with it.

2. Don't get all wrapped around instructions, what people are calling them and how they are documented. The days of instruction lockout are over. You are building a mountain out of a molehill and it is not even proven, at this point, that anyone, outside of the HPC world, is even going to use AVX.

And don't forget, the most common AVX will be AVX-128. We can do 2X those instructions because intel pads the top 128-bit of the AVX register.

Once we have BD out, and SB EP (probably late Q1 2012) we can continue this conversation.

SNB-EP will be out in Q4 2011. It's EX that will be out in Q1 2012.

Nemesis 1 · Jun 12, 2011

JFAMD said:
1. Don't harass nemesis on english. I spend half of my life in countries that don't speak english, it's a big world, everyone needs to deal with it.

2. Don't get all wrapped around instructions, what people are calling them and how they are documented. The days of instruction lockout are over. You are building a mountain out of a molehill and it is not even proven, at this point, that anyone, outside of the HPC world, is even going to use AVX.

And don't forget, the most common AVX will be AVX-128. We can do 2X those instructions because intel pads the top 128-bit of the AVX register.

Once we have BD out, and SB EP (probably late Q1 2012) we can continue this conversation.

Thanks JFamd . Its really not AVX I am discussing . Its the prefix of Vex. Thats it would seem is causing all the performance increases. AVX is AVX , The prefix of vex is more akin to Intels mitosis . Hardware and software working to give at min. a 2x increase in performance for Apps that compile to it. Mitosis works just like prefix of vex in that it requires both software and hardware to be implamented . Their is a link in the thread here that I posted last week with the Mitosis abstract . Prefix is used with whats called computation slices. Can you state AMD has now or will ever have the prefix of Vex. Intel threw the prefix of vex on 128 registers can also get 2.2x the performance . With AVX 2 and the use of prefix of vex even integre performance will be speedup by 2.2X There is a reason intel doesn't have to give AMD the prefix of vex. Actually JF you said intel is padding the register not really true they are simply clearing it to all zeros this is done automaticly . Would you care to explain to us how AMD clears the register on say old SSe2 using a rex prefix. I know XOP won't clear the register were discussing in the YMM field I believe.

Nemesis 1 · Jun 12, 2011

podspi said:
Yea! I can't wait to get Itanium level performance out of x86 CPUs like Intel's own Xeon!

Oh wait:

I'm sure these extensions will be useful and increase performance. I doubt they are going to remove the need for OoO and branch predictors. Itanium honestly isn't that successful except in heavy iron niches, and Xeon, an OoO x86 CPU, now threatens to make that niche even smaller.

I'm going to be blunt: Your ASSERTION that AMD will be unable to use the new instruction extensions IS NOT SUPPORTED by any of the documents you have linked. You can copy and paste excerpts from them all day long, and you can talk about the technology all you'd like (though I wish it wouldn't be in this thread, since it is OT and I liked the topic), but nothing you have said so far has been convincing in the slightest that AMD will be unable to use any new instruction-sets.

The Itanium uses something called software pipelining. Than you say IS NOT SUPPORTED. The fact AMD doesn't have the VEXprefix is all the support I need Ya OoO may not go but the X86 decoders are likely history on Haswell .

http://www.gelato.org/pdf/Workshops/geneva05/swp_early_exits_intel.pdf

podspi · Jun 12, 2011

Nemesis 1 said:
The Itanium uses something called software pipelining. Than you say IS NOT SUPPORTED. The fact AMD doesn't have the VEXprefix is all the support I need Ya OoO may not go but the X86 decoders are likely history on Haswell .

http://www.gelato.org/pdf/Workshops/geneva05/swp_early_exits_intel.pdf

Interesting, but the paper talks about compiler support. Assuming the Intel compiler is still compiling x86-compatible code, AMD will be able to support these just fine :biggrin:

BTW, Intel doesn't have "the prefix of vex" either, unless I'm misunderstanding something, or somehow I slept until 2013 :awe: ... In which case, has AMD released BD yet? :sneaky:

Cerb · Jun 12, 2011

Nemesis 1 said:
I haven't read the chapter on emulation yet going to start that now. On AVX2

PDF said:
3.4 EMULATION
Setting the CR0.EMbit to 1 provides a technique to emulate Legacy SSE floating-point instruction sets in software. This technique is not supported with AVX instructions, nor FMA instructions.

If an operating system wishes to emulate AVX instructions, set
XFEATURE_ENABLED_MASK[2:1] to zero. This will cause AVX instructions to #UD. Emulation of FMA by operating system can be done similarly as with emulating AVX instructions.

Sure you got the right PDF linked? That's the most I could find about emulation. The way I read it hints at Intel wanting to have a way out, as time goes by, entirely supplanting old instructions with new ones, but software as the CPU's front-end shouldn't need anything of this sort.

Overall, if they are going to go JIT compiling x86, that's great. One of the main problems with pure on-chip approaches is that good software optimizations, and making them persistent over the program's lifetime, are null and void. Knowing what Transmeta was doing, and how they did it, it was quite impressive that they basically had the original Atom matched, a year beforehand, on 90nm. They had business failings (they should have had 100% Java HW support, and added one or more other common ISAs to emulate, like ARM7 or PPC, even before considering that low end perf/$ was not a mainstream concern, quite yet), and were ahead of their time. The Elbrus guys in particular had that right kind of genius, like the Alpha team (one of which just got fired by AMD for not being able to do the impossible :\), but the potential for lifetime-optimized JIT compiling, provided lifetime optimization is included, has been well-studied for some time, even out in the field, but even in the face of, "THIS IS BETTER THAN MAKING A NEW PROCESSOR!", not much happens. Not that nothing happens, but not as much as should happen, given the potential. It seems people lacking the guts to risk trying to go to market with something like that has held it back more than anything (Transmeta's story, FI, isn't exactly going to convince someone to invest in your idea

). Just being able to bridge that gap between what the hardware can know at this moment, and what the compiler has to make bad assumptions about, should typically get +10-30% over time (same as profiled compilation, but most programs are difficult to profile), not counting any other related optimizations, which could improve that further.

However, whether the HW has x86 decoders in HW or FW, they are still extending x86 w/ AVX+vex, and will not be ditching x86, even if the HW no longer handles it natively (the market kind of likes x86), so while I could see AMD getting father and father behind in performance(sorry, JFAMD

), and I could see Intel converging x86 and IA64 with a chip that internally runs neither, I still don't see what would keep AMD from implementing vex (so long as they only implement instructions Intel has already specified). Everything I read looks entirely compatible with x86 as we know it.

Nemesis 1 · Jun 12, 2011

podspi said:
Interesting, but the paper talks about compiler support. Assuming the Intel compiler is still compiling x86-compatible code, AMD will be able to support these just fine :biggrin:

BTW, Intel doesn't have "the prefix of vex" either, unless I'm misunderstanding something, or somehow I slept until 2013 :awe: ... In which case, has AMD released BD yet? :sneaky:

Yes intel has the Vexprefix they invenred it. . If you would have read the material . Intel plainly states that they are adding more and more prefixs as time rules on and its a time consuming thing , Until AMD BD comes out we haven't any offset compares to make but I pretty sure alot of SSE2 programs are being recompiled already with SSE2 its fairly straight forward just add the prefix VEX and its an auto recompile that simple it is on SSE2.

Nemesis 1 · Jun 12, 2011

I was referring to the 500page PDF not the 800 page PDF I linked. I found something much better to read so I haven't looked at it yet. I was going to and Dar found something much better to read. As long as their is no X86 decoders its not an x86 cpu .

Most the SSe stuff will already be recompiled by the time haswell is here . Intel will likely already have recompiled integra SSe . So there won't be alot left in X86 that won't allow Imtel to morph x86 . Intel can use SSE on non intel x86 cpus . Fact is Intel can do anything they want with X86 incliding sell it. and the buyer gets everthing intel has thats for x86 other than software . Thats my position and you cann't prove otherwise. or someone would have by now. But none have or can .

If I was intel I would be careful about what I add to X86 with AMD lerking . Haswell may well endup with differant cores . So does AMD get whatever is on a Vliw core if intel adds one with the X86 core like Amd has added GPU to llano or is this just a one way street? IB isn't that far off lets see what intel does there inso far as AVX. AVX and VEXprefix are not the same thing as you likely already know. The fact they work together is because its about vectoring. SO we may get lucky and get a glimps of haswell befor the end of 2012. We will also likely get alot of info at fall idf 2012. That would be enough for me.

Abwx · Jun 12, 2011

Nemesis 1 said:
Most the SSe stuff will already be recompiled by the time haswell is here . Intel will likely already have recompiled integra SSe . So there won't be alot left in X86 that won't allow Imtel to morph x86 . Intel can use SSE on non intel x86 cpus . Fact is Intel can do anything they want with X86 incliding sell it. and the buyer gets everthing intel has thats for x86 other than software . Thats my position and you cann't prove otherwise. or someone would have by now. But none have or can .

You have a flawed view of what is currently called a X86 Cpu...

Current Cpus are not litteraly executing X86 code as such.
All instructions are converted into micro code that is specific
for each Cpu vendor.

Basically, X86 is some kind of archaic VLIW , and current Cpus
translate it in a RISC like instruction set before execution.

What you are talking about as being future is in facty already
old history..

Cerb · Jun 12, 2011

Nemesis 1 said:
If I was intel I would be careful about what I add to X86 with AMD lerking . Haswell may well endup with differant cores . So does AMD get whatever is on a Vliw core if intel adds one with the X86 core like Amd has added GPU to llano or is this just a one way street?

Again, neither. AMD gets some access to Intel's implementations (patent cross-licensing agreements), but Intel's HW is Intel's HW. What AMD will/should get is anything added to the x86 ISA. Whatever is behind that, Intel gets to keep for themselves, and they have every right to make it fit their long-term goals, not AMD's. Just that what they shouldn't be able to do is lock out AMD from being able to implement instructions that have been declared and specified as addons to the x86 ISA.

Forcing AMD to play catchup is good business. Not nice, not entirely fair, but good business practice, and within the game's rules (antitrust laws and precedents). Not only that, but playing entirely fair would indicate that they did not consider AMD a real threat to their market dominance. Shutting AMD out, however, would be an abuse of their market position. Intel cannot do anything they want with x86. They get to be the primary controlling entity for x86, but they do have rules they must follow.

Nemesis 1 · Jun 12, 2011

Not if it isn't on a cpu with X86 decoders. If it has a X86 decoder its X86 if it doesn't their is NO agreement between AMD and Intel . Itanic has X86 hardware . Yet AMD has no claim on that tech . AVX is used on vectors AVX is the new instruction set. NOT vexprefix thats a coding scheme like AMD gpus use vliw. Imagination tech uses VLIW ect. ect ect . SHOW me ware AMD has any say with what intels does with hardware thats on an X86 cpu . You cann't now or ever . If there is no X86 decoder its not an x86 cpu . If intel uses a Epic or VLIW intel is free to use their IP anyway they choose . Same applies to AMD . AVX is new instruction set. Prefix of vex. is code for thats instruction set . Its used to optimize AVX code. IF AMD could use it they would but AMD doesn't have the software so its usesless to them . The very fact AMD has XOP speaks volumns. As far as this debate goes prove AMD can use Vexprefix . I asked JFamd 2 times now and got no reply . Infact in his last reply he downplayed AVX. If they could use the prefix Vex he wouldn't have downplayed it as he did. He knows exactly why its important to clear the YMM registers to all zeros . He nows why its important that the rex prefix is inside intels Vex prefix .

Nemesis 1 · Jun 13, 2011

Heres a little good reading . I have the PDf in the research pc , Read this and than read what the prefixVex does and how it does . Mirror mirror on the wall who has the fairest cpu of all .

Speculative parallelization can provide significant sources of
additional thread-level parallelism, especially for irregular
applications that are hard to parallelize by conventional
approaches. In this paper, we present the Mitosis compiler, which
partitions applications into speculative threads, with special
emphasis on applications for which conventional parallelizing
approaches fail.
The management of inter-thread data dependences is crucial for
the performance of the system. The Mitosis framework uses a pure
software approach to predict/compute the thread’s input values.
This software approach is based on the use of pre-computation
slices (p-slices), which are built by the Mitosis compiler and
added at the beginning of the speculative thread. P-slices must
compute thread input values accurately but they do not need to
guarantee correctness, since the underlying architecture can
detect and recover from misspeculations. This allows the compiler
to use aggressive/unsafe optimizations to significantly reduce
their overhead. The most important optimizations included in the
Mitosis compiler and presented in this paper are branch pruning,
memory and register dependence speculation, and early thread
squashing.
Performance evaluation of Mitosis compiler/architecture shows
an average speedup of 2.2.
Categories and Subject Descriptors C.1.4 [Processor
Architectures]: Parallel Architectures, D.3.4 [Programming
Languages] Processors – compilers, code generation,
optimization.
General Terms Performance, Design
Keywords Speculative multithreading; thread-level parallelism;
automatic parallelization; pre-computation slices.
1. Introduction
Several microprocessor vendors have recently introduced single
chip architectures that can execute multiple threads in parallel,
exploiting thread-level parallelism. Two different approaches
have been used to architect these systems: simultaneous
multithreading [25][7] and multiple cores [24][22][16]. These
architectures increase throughput by executing independent jobs
in parallel, or reduce execution time by parallelizing applications.
This latter case has proved to be successful for regular numerical
applications, but less so for non-numerical, irregular applications,
for which the compiler usually fails to discover a significant
amount of thread-level parallelism.1
Speculative multithreading (SpMT for short) attempts to speed up
the execution of applications through speculative thread-level
parallelism. Threads are speculative in the sense that they may be
data and control dependent on previous threads (that have not
completed) and their execution may be incorrect.
There are two main strategies for speculative thread-level
parallelism: (1) use helper threads to reduce the execution time of
high-latency instructions/events through side effects, and (2)
parallelize applications into speculative parallel threads, each of
which contributes by executing a part of the original application.
Helper Threads [6][5][18][27] attempt to reduce the execution
time of the application by using speculative threads to reduce the
cost of high-latency operations (such as load misses and branch
mispredicts). For instance, in [5][27] this is done by executing a
subset of instructions from the original code to pre-compute load
addresses or branch directions. Instructions executed by
speculative threads do not compute/modify any architectural state
of the processor, and thus, all architectural state must still be
computed by the main, conventional thread.
With speculative parallelization ([10][1][13] among others), each
of the speculative threads executes a different part of the program.
This partitioning is based on relaxing the parallelization
constraints and allowing the spawning of speculative threads even
when the compiler cannot guarantee correct execution. When a
speculative thread finishes, the speculation is verified. Unlike
helper threads, the values produced by a speculative thread are

Pay special attention to this part As it sounds exactly like what the prefix Vex does It almost fits perfectly . Its software . Thats why AMD can't use it . LOL. The value prediction. is what intel is using ,

A key point in any SpMT system is how to deal with inter-thread
data dependences. Two mechanisms have been studied so far: (1)
synchronization mechanisms and (2) value prediction. The
synchronization approach imposes a high overhead when
dependences are frequent, as in the workload presented here.
Value prediction has more potential – if the values that are
computed by one thread and consumed by another can be
predicted, the consumer thread can be executed in parallel with
the producer thread since these values are only needed for
validation at a later stage. It is typically assumed that these value
predictions are computed in hardware. The Mitosis system
presents a novel approach, which adds code (derived from the
original program) to predict in software the live-ins (values
consumed, but not produced by, the thread) for each speculative
thread. Because mechanisms for recovery of incorrect threads are
already in place, the code to produce the values need not always
be correct, and can be highly optimized. We refer to this code as
pre-computation slices (p-slices). The main advantages of p-slices
are: (1) they are potentially more accurate in the prediction of
live-ins than a hardware-based predictor, since it is derived from
the original code, (2) they can encapsulate multiple control flows
that contribute to the prediction of live-ins, and (3) they can
accelerate the detection of incorrectly spawned threads.

Now I hope you see more clearly I never had to work so hard to show the obvious befor.

Cerb · Jun 13, 2011

Nemesis 1 said:
Not if it isn't on a cpu with X86 decoders. If it has a X86 decoder its X86 if it doesn't their is NO agreement between AMD and Intel . Itanic has X86 hardware . Yet AMD has no claim on that tech . AVX is used on vectors AVX is the new instruction set. NOT vexprefix thats a coding scheme like AMD gpus use vliw. Imagination tech uses VLIW ect. ect ect . SHOW me ware AMD has any say with what intels does with hardware thats on an X86 cpu . You cann't now or ever . If there is no X86 decoder its not an x86 cpu .

Well, duh. Not once have I said AMD has any such say. In fact, I've said many times now that AMD specifically does not have that say. None of that has anything to do with what I have been saying. If Intel adds to the x86 ISA, like with vex, whether or not they use x86 decoders in future CPUs should have nothing to do with whether or not AMD is allowed to support those instructions with their own implementations in their own HW. The two issues either are not coupled, or should not be coupled. If, by some technicality, they are coupled, it won't be the first time the DoJ will have had to come in, slap Intel on the wrist, and tell it to play nice.

If intel uses a Epic or VLIW intel is free to use their IP anyway they choose . Same applies to AMD . AVX is new instruction set. Prefix of vex. is code for thats instruction set . Its used to optimize AVX code. IF AMD could use it they would but AMD doesn't have the software so its usesless to them . The very fact AMD has XOP speaks volumns. As far as this debate goes prove AMD can use Vexprefix . I asked JFamd 2 times now and got no reply . Infact in his last reply he downplayed AVX.

He's a marketing guy for AMD. If Intel has something AMD doesn't, that's pretty much his job. For the next couple years, he's not likely off, either, but I have a feeling things will begin to change before AMD can fully catch up. While I don't see anything technically preventing AMD from using vex, it does look like it will take a bit of work, more than can be done late in the game.

If they could use the prefix Vex he wouldn't have downplayed it as he did. He knows exactly why its important to clear the YMM registers to all zeros . He nows why its important that the rex prefix is inside intels Vex prefix .

That AMD has XOP doesn't speak volumes. It speaks the same amount that having 3DNow! did, before they got support for version n-2 of Intel's next extensions. If Intel's software matters, then Intel will find themselves out of favor with the market at large, and/or in antitrust hot water, again. AMD doesn't have it now, of course. They won't have it in the near future. JFAMD's job is certainly not to indirectly slight his own employer.

For AMD to properly support instructions using vex in BD, they likely would have needed the entirety of it fully disclosed 4-5 years ago, or maybe even longer. Intel disclosed enough to titillate that long ago, but not enough to use. The instructions that are known now are not very general, and should not be easy for AMD to just tack on support for.

Also, if anyone else wants nice formatting, and more info on Nemesis 1's last post:
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.81.8192&rep=rep1&type=pdf

JFAMD · Jun 13, 2011

Intel17 said:
SNB-EP will be out in Q4 2011. It's EX that will be out in Q1 2012.

I will take the over on that bet.

OCGuy · Jun 13, 2011

JFAMD said:
I will take the over on that bet.

Whoever took the "over" on BD would have made a killing.

Arachnotronic · Jun 13, 2011

JFAMD said:
I will take the over on that bet.

What makes you think it will be delayed?

bronxzv · Jun 13, 2011

Nemesis 1 said:
Its really not AVX I am discussing . Its the prefix of Vex. Thats it would seem is causing all the performance increases. AVX is AVX , The prefix of vex is more akin to Intels mitosis .

For your information the VEX prefix is *already used* for today's AVX as featured in Sandy Bridge and in forthcoming AMD's Bulldozer. So basically all your lengthy talk about the "prefix of vex" (sic) hinting at some non-x86 future is pure BS. Hint: the only goal of prefixes (such as REX in the past) is to extend the x86 ISA, if it was a new ISA no prefix will be needed and the code will be yet more dense.

bronxzv · Jun 13, 2011

JFAMD said:
it is not even proven, at this point, that anyone, outside of the HPC world, is even going to use AVX.

I'll bet that most 3D renderers that are currently optimized for SSE will be released for AVX-256 down the line, the port is pretty easy with AVX support in gcc,VC++,icc,..
FYI here is an example of a *currently available* 3D engine ported to AVX-256 : http://www.inartis.com/Products/Kribi%203D%20Engine/Default.aspx

Also as you should know Intel's MKL and IPP, available today, feature AVX-256 optimized paths, any application merely linked with these libraries will use AVX-256 in some/most critical loops, even without even using a linker with the DLL version

JFAMD said:
And don't forget, the most common AVX will be AVX-128

Care to provide a source for this bold statement? IMHO it makes no much sense (besides guerilla marketing), the speedup going from SSE to AVX-128 is too small to incent many ISVs bothering with yet one more code path, better to target AVX-256 since it will be faster on mainstream hardware

Nemesis 1 · Jun 13, 2011

bronxzv said:
For your information the VEX prefix is *already used* for today's AVX as featured in Sandy Bridge and in forthcoming AMD's Bulldozer. So basically all your lengthy talk about the "prefix of vex" (sic) hinting at some non-x86 future is pure BS. Hint: the only goal of prefixes (such as REX in the past) is to extend the x86 ISA, if it was a new ISA no prefix will be needed and the code will be yet more dense.

NO where any where other than in forums can you show AMD has the Vex prefix . The Vex Prefix is Intels Mitosis , Plan and simply put . I have showed the proof . Now you guys in denial have to debunk that proof . Beware I will eat ya alive .

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Senior member

Lifer

Lifer

Lifer

Lifer

Golden Member

Senior member

Lifer

Lifer

Lifer

Golden Member

Elite Member

Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Elite Member

Senior member

Lifer

Lifer

Senior member

Senior member

Lifer