Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Nemesis 1 · Jun 13, 2011

The Mitosis system
presents a novel approach, which adds code (derived from the
original program) to predict in software the live-ins

In the prefix Vex the rex prefix is inside the code of the Vexprefix = P slice , There you have your example.

JFAMD · Jun 13, 2011

bronxzv said:
I'll bet that most 3D renderers that are currently optimized for SSE will be released for AVX-256 down the line, the port is pretty easy with AVX support in gcc,VC++,icc,..
FYI here is an example of a *currently available* 3D engine ported to AVX-256 : http://www.inartis.com/Products/Kribi%203D%20Engine/Default.aspx

Also as you should know Intel's MKL and IPP, available today, feature AVX-256 optimized paths, any application merely linked with these libraries will use AVX-256 in some/most critical loops, even without even using a linker with the DLL version

Care to provide a source for this bold statement? IMHO it makes no much sense (besides guerilla marketing), the speedup going from SSE to AVX-128 is too small to incent many ISVs bothering with yet one more code path, better to target AVX-256 since it will be faster on mainstream hardware

Most of the apps today that utilize any FP code are doing SSE 9128-bit). Most of that is not fully utilized.

The world splits into lightly-FP and FP-centric. In the light-FP world (probably 90% of the apps, including things like Excel, games, etc.) 128-bit FP is fine for them. They convert their code from SSE to AVX-128 for compatibility sake. They rarely fill up all 128-bit on a cycle, so changing code to be able to do 256-bit means no performance gain, no real benefit, but more work and more risk (product schedules, testing, etc.)

In the heavy FP world (HPC, rendering, technical apps, etc.) they will take advantage of 256-bit and they will be all over it. But they represent the minority of the market.

I have no numbers to back that up, but HPC is ~10% of the server workloads, fianancial and rendering are ~10-15%, so 75-80% are not heavy FP. That is good enough to constitute majority in my mind.

Intel17 said:
What makes you think it will be delayed?

Wanna take the bet? If not, no problem. October will prove me right or wrong.

Arachnotronic · Jun 13, 2011

Sure, I'll take the bet!

Sandy Bridge EP is going to be out in Q4 2011. Intel can't really afford to compete with Interlagos with the current Westmere Xeons.

Nemesis 1 · Jun 13, 2011

bronxzv said:
ICC with the -ssp compilation flag do the slicing you are refering to since several years
http://www.ncsa.illinois.edu/UserInfo/Resources/Software/Intel/Compilers/9.0/C_ReleaseNotes.htm

VEX encoding wasn't involved and it has (obviously) a completely different purpose

I won't even go there. you haven't a clue what your talking about . I have shown the proof you wanted yet you presist with links that are of no value to your self . As in your link to Wik vexprefix defining of vexprefix . As noware did it tie vexprefix to XOP or AMD .

DISINFORMATION IS ALL your adding and confusion the example I gave is enough proof and is fully documented in this thread.

Nemesis 1 · Jun 13, 2011

Intel17 said:
Sure, I'll take the bet!

Sandy Bridge EP is going to be out in Q4 2011. Intel can't really afford to compete with Interlagos with the current Westmere Xeons.

Ya I was going to take him up on that , But he still hasn't said if AMD can use the Vexprefix . Of Course if he says AMD may use it. I would let intel know about it .Not that JFamd is speaking for AMD. Our intel rep must be laughing is butt off on this debate which is relavent to the topic title.

bronxzv · Jun 13, 2011

Nemesis 1 said:
In the prefix Vex the rex prefix is inside the code of the Vexprefix = P slice

I really hope that you know that this statement makes no sense

Nemesis 1 · Jun 13, 2011

JFAMD said:
Most of the apps today that utilize any FP code are doing SSE 9128-bit). Most of that is not fully utilized.

The world splits into lightly-FP and FP-centric. In the light-FP world (probably 90% of the apps, including things like Excel, games, etc.) 128-bit FP is fine for them. They convert their code from SSE to AVX-128 for compatibility sake. They rarely fill up all 128-bit on a cycle, so changing code to be able to do 256-bit means no performance gain, no real benefit, but more work and more risk (product schedules, testing, etc.)

In the heavy FP world (HPC, rendering, technical apps, etc.) they will take advantage of 256-bit and they will be all over it. But they represent the minority of the market.

I have no numbers to back that up, but HPC is ~10% of the server workloads, fianancial and rendering are ~10-15%, so 75-80% are not heavy FP. That is good enough to constitute majority in my mind. or better yet give a an example of the coding . If you would like I will get one using the vexprefix and than you can show the XOP version .

Wanna take the bet? If not, no problem. October will prove me right or wrong.

I am already one up on you on the BD release . So 2 up suites me fine . Lets here your conditions

Come on your talking AMD tech here not intel . Prefix of vex can add operands to a 1 operand bit . That doesn't improve performance ?

Define for us how the XOP for AVX in SSE2 uses the rex prefix . We already know how intels Vexprefix does this . Give us an example of How AMD does this with AVX instruction set using XOP code. Please talk in terms everyone here can understand.

bronxzv · Jun 13, 2011

Nemesis 1 said:
I won't even go there. you haven't a clue what your talking about .

or you? or maybe we aren't speaking about the same Mitosis project, the one I'm thinking at is indeed related to pre-computation slices available with the speculative precomputation flag

Nemesis 1 said:
I have shown the proof you wanted

Nope, unfortunately you don't, just post a link to a real document establishing a link between Mitosis and VEX and I'll admit that I was really clueless about it, hint: copy pasting existing unrelated random documents and adding your own (easy to spot) sentences doesn't apply.

Nemesis 1 said:
As in your link to Wik vexprefix defining of vexprefix . As noware did it tie vexprefix to XOP or AMD .

look at this link http://en.wikipedia.org/wiki/VEX_prefix , History 6th bullet :

"
2011. The AVX, XOP and FMA4 instruction sets, all using the VEX scheme, will be supported in the AMD Bulldozer processor, according to AMD plans[7].
"

Nemesis 1 said:
I gave is enough proof and is fully documented in this thread.

the goal of a proof is to convince people, I'll say that you failed so far, I will welcome the opinion of others to see if I'm really that dumb to not see the light after all your "explanations"

Nemesis 1 · Jun 13, 2011

bronxzv said:
2011. The AVX, XOP and FMA4 instruction sets, all using the VEX scheme, will be supported in the AMD Bulldozer processor, according to AMD plans[7].
"

NO they are all using the AVX instruction set . That work on vectors. has nothing to do with prefix of vex with P-slices none what so ever. You can use Vex code but Vex code and the Vexprefix are not the same thing . Troll troll troll your boat down a dam. XOP has NO P-slices none. AMD does not have Intel compilers and is not entitled to them . God some of you guys think entitlement is for all things including government handouts .

bronxzv · Jun 13, 2011

Nemesis 1 said:
NO they are all using the AVX instruction set

AVX, FMA4 and XOP are 3 distinct opcode spaces (and features flag), for example Intel isn't supporting XOP and FMA4 but AVX, AMD support all of them, btw FMA4 was defined by Intel I'm sure you'll understand it's using VEX

for more information about XOP, have a look here:
http://en.wikipedia.org/wiki/XOP_instruction_set

don't miss this part:

"
AMD has changed the encoding from the original SSE5 specification in order to improve compatibility with Intel's AVX instruction set and the new VEX coding scheme.
"

Nemesis 1 said:
has nothing to do with prefix of vex with P-slices none what so ever

huh?

Nemesis 1 · Jun 13, 2011

Whats going to really interesting is when Intel starts using the Vexprefix on CL instruction set . Which they can . They can even use Vexprefix on AVX code done by others .

Nemesis 1 · Jun 13, 2011

bronxzv said:
AVX, FMA4 and XOP are 3 distinct opcode spaces (and features flag), for example Intel isn't supporting XOP and FMA4 but AVX, AMD support all of them, btw FMA4 was defined by Intel I'm sure you'll understand it's using VEX

for more information about XOP, have a look here:
http://en.wikipedia.org/wiki/XOP_instruction_set

don't miss this part:

"
AMD has changed the encoding from the original SSE5 specification in order to improve compatibility with Intel's AVX instruction set and the new VEX coding scheme.
"

huh?

YOU just don't get it . Vex coding scheme is NOT the prefix of Vex coding scheme . This is useless to AMD as AMD DOES NOT HAVE INTEL COMPILERS THAT ACTUALLY DO THE ENCODING. IF amd tried to use the VEXprefix it would cause an up. Those flags you talk about would prevent the OS from exacuting AVX when UP is present. . In the pp coding AMD=0/0 Intels PP is 0/1 an up would be pesent in a prefix using 0/0 like XOP code

JFAMD · Jun 13, 2011

Intel17 said:
Sure, I'll take the bet!

Sandy Bridge EP is going to be out in Q4 2011. Intel can't really afford to compete with Interlagos with the current Westmere Xeons.

SB EP was on the roadmap with a Q3 2011 date. So, technically, if you are calling Q4, it has already slipped, right?

bronxzv · Jun 13, 2011

JFAMD said:
The world splits into lightly-FP and FP-centric. In the light-FP world (probably 90% of the apps, including things like Excel, games, etc.) 128-bit FP is fine for them.

I'll say that scalar SSE/SSE2 code (or even x87 code for 32-bit targets) is fine for them, they will typically not bother to vectorize their code so they are not really using 128-bit SSE/SSE2 right now

JFAMD said:
They convert their code from SSE to AVX-128 for compatibility sake.

Why will they do that since Sandy Bridge has good support for legacy SSE code and Bulldozer will have probably even better support ? Why will they bother to maintain one more code path since they must continue to support legacy CPUs?

JFAMD said:
In the heavy FP world (HPC, rendering, technical apps, etc.) they will take advantage of 256-bit and they will be all over it. But they represent the minority of the market.

agreed, these seldom ISVs will target AVX-256, not AVX-128

bronxzv · Jun 13, 2011

Nemesis 1 said:
Vex coding scheme is NOT the prefix of Vex coding scheme

I have been warned "2. an unconquerable opponent or rival.", you're really good at this game

Nemesis 1 · Jun 13, 2011

Can't see your message bronxzv . Your the first person to ever get into my ignore list. Your simply trolling .

Nemesis 1 · Jun 13, 2011

JFAMD said:
SB EP was on the roadmap with a Q3 2011 date. So, technically, if you are calling Q4, it has already slipped, right?

LINK

bronxzv · Jun 13, 2011

Nemesis 1 said:
Can't see your message bronxzv
Your the first person to ever get into my ignore list.

hmm how did you know there was a message to see then? anyway I suppose it's your way to aknowledge that you entirely made up that VEX - Mitosis crazy story and you're short of ideas for the next episode

Nemesis 1 said:
Your simply trolling

Nope, you're deeply confused about it. Publishing new episodes of the Twilight Zone online don't give you any rights to insult people who obviously know what they are talking about and provide only factual, verifiable information instead of pseudoscientific gibberish.

Nemesis 1 · Jun 13, 2011

I pulled this from post 194. As I wanted to be sure all the info used in a debate on Mitosis would be easy to find by the reader rather than look threw a 800page pdf.

Flexible and more compact bit fields are provided in the VEX prefix to retain the
full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are
provided in the three-byte VEX prefix only because only a subset of SIMD instructions
need them.
I hope this helps the people with reading comprehension problems . IF not. OH well!

giderac · Jun 13, 2011

i anxiously await the conclusion of this debate!

Nemesis 1 · Jun 13, 2011

This debate will never die so long as people compare SB, IB . Haswell to AMD products using AVX . Anand should look into this deeper. and speak to intel on this matter and than do a report on it .

Idontcare · Jun 13, 2011

Nemesis 1 said:
...a debate on Mitosis...

Do you mean Anaphase?

I thought Mitosis was abandoned as a dead-end.

Arzachel · Jun 13, 2011

There was no debate, Nemesis1 went for an argument through verbosity and put bronxzv on ignore, when he called it out. That should be pretty telling.

podspi · Jun 13, 2011

Nemesis 1 said:
This debate will never die so long as people compare SB, IB . Haswell to AMD products using AVX . Anand should look into this deeper. and speak to intel on this matter and than do a report on it .

In the end performance is all that matters. If Haswell really brings a speedup of 2x compared to SB/IB, you will have a believer out of me :awe:

bronxzv · Jun 13, 2011

podspi said:
In the end performance is all that matters. If Haswell really brings a speedup of 2x compared to SB/IB

A 2x speedup compared with Sandy Bridge for computationally dense FP kernels looks indeed possible.

From the brand new Haswell New Instruction blog:

http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/

"Our floating-point multiply accumulate significantly increases peak flops"

it looks like a hint for a full fledged two FMA per clock implementation (with 1 FMA per clock peak flops will stay the same than in Sandy Bridge which can already execute 1 VEX-256 vmulpd/ps + 1 VEX-256 vaddpd/ps per clock), in other words 2x the flops per core in Haswell with FMA3 vs. vanilla AVX. From the AVX IDF slides (*1) RAM bandwidth will be increased too, more than 2x seems easily reachable with stacked RAM. Moreover the new gather instructions and to a lesser extent true generic permute will help some codes to reach higher effective flops.

*1: BJ11_ARCS005_100_ENG.pdf from IDF Spring 2011, previously available at http://www.intel.com/idf/

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Lifer

Senior member

Lifer

Lifer

Lifer

Senior member

Lifer

Senior member

Lifer

Senior member

Lifer

Lifer

Senior member

Senior member

Senior member

Lifer

Lifer

Senior member

Lifer

Member

Lifer

Elite Member

Senior member

Golden Member

Senior member