Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Gigantopithecus · Jun 15, 2011

Don't feed the troll, John.

jones377 · Jun 15, 2011

JFAMD said:
No, I am not saying that.

What I am saying is that someone on this thread is getting completely wrapped around the axle on one aspect and given the two choices (argue about it or wait until benchmarks are out) I would choose plan B.

Well, that's what this forum is here for

BTW, can you also confirm that you can not only combine 256-bit AVX with Integer SSE ops but also AVX with 128-bit AVX integer ops in the same cycle?

Idontcare · Jun 15, 2011

JFAMD said:
What I am saying is that someone on this thread is getting completely wrapped around the axle on one aspect and given the two choices (argue about it or wait until benchmarks are out) I would choose plan B.

+1

Nemesis 1 · Jun 15, 2011

Gee golly Gee After 2 years of BD hype . Someone here had his cheerios this morning .

I thought you guys might find this interesting. Thought video than I said no way this thread .

http://www.pcper.com/news/Editorial/AMD-Fusion-Developer-Summit-2011-Live-Blog

Beings how I was discussing Haswell maybe leaving X86 decoders out . I find this interesting its another me to .

That was a pretty interesting demo though we have seen physics demos like that on GPUs for years. The key here is for developers: all on a single executable that runs on the CPU, GPU, multi-GPUs with almost no modifications. Xs has a thread going on this . pretty neat stuff.

Nemesis 1 · Jun 15, 2011

Beings how your frisky today john does AMD XOP do anything like this , Why is it you AMD guys like slides so much The intel 800page PDF says it all and the discribe in that pdf how AMD has to zero out YMM . Why don't you explain it to us .

This is the vexprefix.

1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.

Care to tell us john how AMD does this .

Nemesis 1 · Jun 15, 2011

Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

http://news.softpedia.com/news/Intel...n-187568.shtml

Well if you say so , But this is more likely the case .

Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
compute and commit these values. This is because once the
previous thread commits, the speculative thread need no longer
rely on predicted values, but can read committed values. This time
is estimated as the time it takes to sequentially execute all the
code between the SP and CQIP minus the thread spawn overhead

As you can clearly see JF implies intel has to do this because they haven't figured AMDS AVX instructions . So they don't have the same functionality as AMDs real deal AVX . This is dishonest in a manner . INTEL invented AVX for Intel cpus the very fact that JF amd says intel has to do something differant than AMD should tell you something. JF is referring to clearing the YMM to all zeros than he implies this takes more clock cycles . If you read the AVX pdf you will see this is not a fact. Its a necessary action for mitosis to work

If you read the AVX the length is a big deal. . One of these 2 processors can do greater than or less than the other processor cann't and that will cause an up. You tell me which is which. SB or BD
__________________

Nemesis 1 · Jun 15, 2011

From that Fusion link I found this most interesting.

That was a pretty interesting demo though we have seen physics demos like that on GPUs for years. The key here is for developers: all on a single executable that runs on the CPU, GPU, multi-GPUs with almost no modifications

Outrage · Jun 15, 2011

Nemesis 1 said:
Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

http://news.softpedia.com/news/Intel...n-187568.shtml

Well if you say so , But this is more likely the case .

Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
snip........

You have already posted this one time, post: 349 and you got the answer in post: 364

Nemesis 1 · Jun 15, 2011

I don't see the answer . Let me check PDF101

JFAMD · Jun 15, 2011

jones377 said:
Well, that's what this forum is here for BTW, can you also confirm that you can not only combine 256-bit AVX with Integer SSE ops but also AVX with 128-bit AVX integer ops in the same cycle?

Flex FP can do SSE and AVX128 on the same cycle. Not SSE and AVX256 pn the same cycle.

Blue Shift · Jun 15, 2011

Nemesis 1 said:
Beings how your frisky today john does AMD XOP do anything like this , Why is it you AMD guys like slides so much The intel 800page PDF says it all and the discribe in that pdf how AMD has to zero out YMM . Why don't you explain it to us .

-snip-

Care to tell us john how AMD does this .

Just to go out on a limb here, I really don't think John (or anyone else) wants to explain it to you [again].

This "debate" has become pointless and repetitive, and it seems that everyone here would rather just wait for it to come out and see how it performs than argue over whether or not certain instructions use a specific prefix.

Outrage · Jun 15, 2011

Nemesis 1 said:
I don't see the answer . Let me check PDF101

You don't have to get your wife to fetch you one of those cd's with your offline version of this thread, just press the link i provided for you.

Nemesis 1 · Jun 15, 2011

JFAMD said:
Flex FP can do SSE and AVX128 on the same cycle. Not SSE and AVX256 pn the same cycle.

John there are alot of Ups with intels AVX on intel processors. What happens on a vexprefix for code ending in PP or 0/0 . You willing to ans question than tell me does AMD have the VEXprefix . AS discribed in the intel pdf. I have asked you this 3 times now .

Nemesis 1 · Jun 15, 2011

404 page not found

Nemesis 1 · Jun 15, 2011

Hay John i hope you didn't wager on this

http://news.softpedia.com/news/Inte...ge-EP-CPUs-Will-Arrive-in-Autumn-187568.shtml

Because you still haven't shown a link of intel saying 3rd Qt.

jones377 · Jun 15, 2011

Nemesis 1 said:
404 page not found

That's because you need to look in the Sandy Bridge optimization manual, not the AVX document.

Nemesis 1 · Jun 15, 2011

Outrage said:
Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

Interesting but I had this already posted here . But I actually used intels words and not mine.

256-bit VEX-encoded instruction and legacy 128-bit SIMD instructions has internal
state to manage the upper and lower halves of the YMM states. Functionally, VEXencoded
SIMD instructions can be intermixed with legacy SSE instructions (non-VEXencoded
SIMD instructions operating on XMM registers). However, there is a performance
impact with intermixing VEX-encoded SIMD instructions (AVX, FMA) and
Legacy SSE instructions that only operate on the XMM register state.
The general programming considerations to realize optimal performance are the
following:
• Minimize transition delays and partial register stalls with YMM registers accesses:
Intermixed 256-bit, 128-bit or scalar SIMD instructions that are encoded with
VEX prefixes have no transition delay due to internal state management.
Sequences of legacy SSE instructions (including SSE2, and subsequent
generations non-VEX-encoded SIMD extensions) that are not intermixed with
VEX-encoded SIMD instructions are not subject to transition delays.
• When an application must employ AVX and/or FMA, along with legacy SSE code,
it should minimize the number of transitions between VEX-encoded instructions
and legacy, non-VEX-encoded SSE code. Section 2.8.1 provides recommendation
for software to minimize the impact of transitions between VEX-encoded code
and legacy SSE code.
In addition to performance considerations, programmers should also be cognizant of
the implications of VEX-encoded AVX instructions with the expectations of system
software components that manage the processor state components enabled by
XCR0. For additional information see Section 4.1.9.1, “Vector Length Transition and
Programming Considerations”.

I myself would like to see how AMD uses 256bit with sse2 that have a rex prefix , John could you give a code example of code written for this operation 256bit on the YMMstate and than the 128bit XMM lower state using SSe2 with a rex prefix . Because It seems to me that creates an up according to the PDF In some cases, I am more interested in the code length as Xop would have to include the bytes of the rexprefix. As intel has the space reserved that is greater than 128 bits in the lowerXMM state. If code goes over 128bits its a UP same as 256bit ymm upper state intel has that spaced reserved greater than 256 bit. =L than we have the memory SS and the pp . So i sure would like to see an example of codeing. Unless in the XMM state your using legacy SSE2 instructions in which case the AMD would need for intel to stall . To keep up .

Nemesis 1 · Jun 15, 2011

Blue Shift said:
Just to go out on a limb here, I really don't think John (or anyone else) wants to explain it to you [again].

This "debate" has become pointless and repetitive, and it seems that everyone here would rather just wait for it to come out and see how it performs than argue over whether or not certain instructions use a specific prefix.

EXPLAIN what . HE hasn't ans. the only real question , Does AMD use the Vexprefix. As defined by intel? So you don't see the importance of the Vexprefix . Referring to the code length only.

We just went threw 2 years of BD hype. and now your willing to wait. Strange that ,NOT!

JFAMD · Jun 20, 2011

AMD supports VEX prefix:

http://support.amd.com/us/Processor_TechDocs/26568.pdf

Now, everybody give it a rest. Enough with the FUD.

Nemesis 1 · Jun 20, 2011

Well I am not going to read an 800 page AMD PDF . YOU copy and paste were amd in its XOP prefix has computational bits in its prefix . and I will believe it .But it has to say XOP prefix and not VEX prefix. Speaking of FUD were is BD

Heres what I want to see John From your PDF But I don't want to see the Vex prefix used . I want to see XOP prefix used or it didn't happen .

1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.

Care to tell us john how AMD does this .

garagisti · Jun 20, 2011

Hopefully as this is simple English, you will understand... if you speak any other language, may be we could google translate this for you....

Check this post...

http://www.xtremesystems.org/forums/...=1#post4883691

This guy owns a SR2 rig and uses it for a desktop. He says BD would beat a 990X. Given he owns a SR2, and he has access to BD, i'd believe him. So let us all wait for benchies... as JF says.

Cheers!

Topweasel · Jun 20, 2011

Last contact with Nemisis, as your copy and paste trolling is getting pretty pointless. Lest you forget that this thread isn't about vex or haswell, but between 8C BD and 4C SB.

Why won't you read a 800 page AMD PDF, when you have insisted everyone read a similar sized Intel document?

Sorry JF, you should have just let this one die.

waffleironhead · Jun 20, 2011

garagisti said:
Hopefully as this is simple English, you will understand... if you speak any other language, may be we could google translate this for you....

Check this post...

http://www.xtremesystems.org/forums/...=1#post4883691

This guy owns a SR2 rig and uses it for a desktop. He says BD would beat a 990X. Given he owns a SR2, and he has access to BD, i'd believe him. So let us all wait for benchies... as JF says.

Cheers!

hmm, links not working

Nemesis 1 · Jun 20, 2011

I posted copied and pasted everthing YOU needed to KNOW. I ask JOHN for just 1 copy and paste. This is about BD and SB . Both claim AVX support . Which is true . But now John has stepped up a level and clains the prefix of Vex support . All he has to do is copy and paste . That 1 little Bit about the XOP prefix as thats the whole ballgame . He won't post it because its likely not there . Instead I will get this VEX crap. Without the XOPprevex wording as stated above in the vexprefix. AMD does not have such a prefixvex or the compilers to go along with it .

jones377 · Jun 20, 2011

Well I was gonna post the number of times the words VEX and prefix appear in AMD's document but I don't think my old computer can handle such a big number.

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Diamond Member

Senior member

Elite Member

Lifer

Lifer

Lifer

Lifer

Senior member

Lifer

Senior member

Senior member

Senior member

Lifer

Lifer

Lifer

Senior member

Lifer

Lifer

Senior member

Lifer

Senior member

Diamond Member

Diamond Member

Lifer

Senior member