Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jones377

Senior member
May 2, 2004
462
64
91
No, I am not saying that.

What I am saying is that someone on this thread is getting completely wrapped around the axle on one aspect and given the two choices (argue about it or wait until benchmarks are out) I would choose plan B.

Well, that's what this forum is here for :) BTW, can you also confirm that you can not only combine 256-bit AVX with Integer SSE ops but also AVX with 128-bit AVX integer ops in the same cycle?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
What I am saying is that someone on this thread is getting completely wrapped around the axle on one aspect and given the two choices (argue about it or wait until benchmarks are out) I would choose plan B.

+1
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Gee golly Gee After 2 years of BD hype . Someone here had his cheerios this morning .

I thought you guys might find this interesting. Thought video than I said no way this thread .

http://www.pcper.com/news/Editorial/AMD-Fusion-Developer-Summit-2011-Live-Blog

Beings how I was discussing Haswell maybe leaving X86 decoders out . I find this interesting its another me to .

That was a pretty interesting demo though we have seen physics demos like that on GPUs for years. The key here is for developers: all on a single executable that runs on the CPU, GPU, multi-GPUs with almost no modifications. Xs has a thread going on this . pretty neat stuff.
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Beings how your frisky today john does AMD XOP do anything like this , Why is it you AMD guys like slides so much The intel 800page PDF says it all and the discribe in that pdf how AMD has to zero out YMM . Why don't you explain it to us .

This is the vexprefix.

1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.

Care to tell us john how AMD does this .
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.





http://news.softpedia.com/news/Intel...n-187568.shtml

Well if you say so , But this is more likely the case .


Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
compute and commit these values. This is because once the
previous thread commits, the speculative thread need no longer
rely on predicted values, but can read committed values. This time
is estimated as the time it takes to sequentially execute all the
code between the SP and CQIP minus the thread spawn overhead

As you can clearly see JF implies intel has to do this because they haven't figured AMDS AVX instructions . So they don't have the same functionality as AMDs real deal AVX . This is dishonest in a manner . INTEL invented AVX for Intel cpus the very fact that JF amd says intel has to do something differant than AMD should tell you something. JF is referring to clearing the YMM to all zeros than he implies this takes more clock cycles . If you read the AVX pdf you will see this is not a fact. Its a necessary action for mitosis to work

If you read the AVX the length is a big deal. . One of these 2 processors can do greater than or less than the other processor cann't and that will cause an up. You tell me which is which. SB or BD
__________________
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
From that Fusion link I found this most interesting.

That was a pretty interesting demo though we have seen physics demos like that on GPUs for years. The key here is for developers: all on a single executable that runs on the CPU, GPU, multi-GPUs with almost no modifications
 

Outrage

Senior member
Oct 9, 1999
217
1
0
Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

http://news.softpedia.com/news/Intel...n-187568.shtml

Well if you say so , But this is more likely the case .

Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
snip........

You have already posted this one time, post: 349 and you got the answer in post: 364 :rolleyes:
 

JFAMD

Senior member
May 16, 2009
565
0
0
Well, that's what this forum is here for :) BTW, can you also confirm that you can not only combine 256-bit AVX with Integer SSE ops but also AVX with 128-bit AVX integer ops in the same cycle?

Flex FP can do SSE and AVX128 on the same cycle. Not SSE and AVX256 pn the same cycle.
 

Blue Shift

Senior member
Feb 13, 2010
272
0
76
Beings how your frisky today john does AMD XOP do anything like this , Why is it you AMD guys like slides so much The intel 800page PDF says it all and the discribe in that pdf how AMD has to zero out YMM . Why don't you explain it to us .

-snip-

Care to tell us john how AMD does this .

Just to go out on a limb here, I really don't think John (or anyone else) wants to explain it to you [again].

This "debate" has become pointless and repetitive, and it seems that everyone here would rather just wait for it to come out and see how it performs than argue over whether or not certain instructions use a specific prefix.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Flex FP can do SSE and AVX128 on the same cycle. Not SSE and AVX256 pn the same cycle.

John there are alot of Ups with intels AVX on intel processors. What happens on a vexprefix for code ending in PP or 0/0 . You willing to ans question than tell me does AMD have the VEXprefix . AS discribed in the intel pdf. I have asked you this 3 times now .
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

Interesting but I had this already posted here . But I actually used intels words and not mine.

256-bit VEX-encoded instruction and legacy 128-bit SIMD instructions has internal
state to manage the upper and lower halves of the YMM states. Functionally, VEXencoded
SIMD instructions can be intermixed with legacy SSE instructions (non-VEXencoded
SIMD instructions operating on XMM registers). However, there is a performance
impact with intermixing VEX-encoded SIMD instructions (AVX, FMA) and
Legacy SSE instructions that only operate on the XMM register state.
The general programming considerations to realize optimal performance are the
following:
• Minimize transition delays and partial register stalls with YMM registers accesses:
Intermixed 256-bit, 128-bit or scalar SIMD instructions that are encoded with
VEX prefixes have no transition delay due to internal state management.
Sequences of legacy SSE instructions (including SSE2, and subsequent
generations non-VEX-encoded SIMD extensions) that are not intermixed with
VEX-encoded SIMD instructions are not subject to transition delays.
• When an application must employ AVX and/or FMA, along with legacy SSE code,
it should minimize the number of transitions between VEX-encoded instructions
and legacy, non-VEX-encoded SSE code. Section 2.8.1 provides recommendation
for software to minimize the impact of transitions between VEX-encoded code
and legacy SSE code.
In addition to performance considerations, programmers should also be cognizant of
the implications of VEX-encoded AVX instructions with the expectations of system
software components that manage the processor state components enabled by
XCR0. For additional information see Section 4.1.9.1, “Vector Length Transition and
Programming Considerations”.

I myself would like to see how AMD uses 256bit with sse2 that have a rex prefix , John could you give a code example of code written for this operation 256bit on the YMMstate and than the 128bit XMM lower state using SSe2 with a rex prefix . Because It seems to me that creates an up according to the PDF In some cases, I am more interested in the code length as Xop would have to include the bytes of the rexprefix. As intel has the space reserved that is greater than 128 bits in the lowerXMM state. If code goes over 128bits its a UP same as 256bit ymm upper state intel has that spaced reserved greater than 256 bit. =L than we have the memory SS and the pp . So i sure would like to see an example of codeing. Unless in the XMM state your using legacy SSE2 instructions in which case the AMD would need for intel to stall . To keep up .
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Just to go out on a limb here, I really don't think John (or anyone else) wants to explain it to you [again].

This "debate" has become pointless and repetitive, and it seems that everyone here would rather just wait for it to come out and see how it performs than argue over whether or not certain instructions use a specific prefix.

EXPLAIN what . HE hasn't ans. the only real question , Does AMD use the Vexprefix. As defined by intel? So you don't see the importance of the Vexprefix . Referring to the code length only.

We just went threw 2 years of BD hype. and now your willing to wait. Strange that ,NOT!
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Well I am not going to read an 800 page AMD PDF . YOU copy and paste were amd in its XOP prefix has computational bits in its prefix . and I will believe it .But it has to say XOP prefix and not VEX prefix. Speaking of FUD were is BD

Heres what I want to see John From your PDF But I don't want to see the Vex prefix used . I want to see XOP prefix used or it didn't happen .


1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.

Care to tell us john how AMD does this .
 
Last edited:

garagisti

Senior member
Aug 7, 2007
592
7
81
Hopefully as this is simple English, you will understand... if you speak any other language, may be we could google translate this for you....

Check this post...

http://www.xtremesystems.org/forums/...=1#post4883691

This guy owns a SR2 rig and uses it for a desktop. He says BD would beat a 990X. Given he owns a SR2, and he has access to BD, i'd believe him. So let us all wait for benchies... as JF says.

Cheers!
 

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
Last contact with Nemisis, as your copy and paste trolling is getting pretty pointless. Lest you forget that this thread isn't about vex or haswell, but between 8C BD and 4C SB.

Why won't you read a 800 page AMD PDF, when you have insisted everyone read a similar sized Intel document?

Sorry JF, you should have just let this one die.
 

waffleironhead

Diamond Member
Aug 10, 2005
7,061
569
136
Hopefully as this is simple English, you will understand... if you speak any other language, may be we could google translate this for you....

Check this post...

http://www.xtremesystems.org/forums/...=1#post4883691

This guy owns a SR2 rig and uses it for a desktop. He says BD would beat a 990X. Given he owns a SR2, and he has access to BD, i'd believe him. So let us all wait for benchies... as JF says.

Cheers!

hmm, links not working
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
I posted copied and pasted everthing YOU needed to KNOW. I ask JOHN for just 1 copy and paste. This is about BD and SB . Both claim AVX support . Which is true . But now John has stepped up a level and clains the prefix of Vex support . All he has to do is copy and paste . That 1 little Bit about the XOP prefix as thats the whole ballgame . He won't post it because its likely not there . Instead I will get this VEX crap. Without the XOPprevex wording as stated above in the vexprefix. AMD does not have such a prefixvex or the compilers to go along with it .
 
Last edited:

jones377

Senior member
May 2, 2004
462
64
91
Well I was gonna post the number of times the words VEX and prefix appear in AMD's document but I don't think my old computer can handle such a big number.