What is EM64T enhanced mode?

Fox5 · Jul 14, 2014

I saw this listed on the Wikipedia page for Skylake:
http://en.wikipedia.org/wiki/Skylake_(microarchitecture)#Architecture

Integer register increased from 16 in x64 standard to 32 in EM64T enhanced mode (r16r31 for enhanced mode only, while it can be used for memory segment in normal x64 ISA)

I can't find any direct quotes from Intel about this, so it may just be a BS rumor, but it's made a spec list that's circulating the Internet.
Changing the ISA in this way seems like a big deal, and it seems unlikely Intel would just double register count and not make a big deal out of it. If so, this could be a good performance boost for applications compiled to take advantage of it, but would be completely incompatible with older processors. Also, bringing memory segments back just seems weird.

Are there any additional details about this?

NTMBK · Jul 14, 2014

[citation needed]

NTMBK · Jul 14, 2014

On a more serious note, they might be getting this confused with the increased vector register count. AVX-512 increases the number of vector registers from 16 to 32 (as well as increasing the width to 512 bits), and I believe that AVX and AVX2 operations can still operate on the lower 128/256 bits of these registers. So AVX/2 code recompiled for Skylake could potentially get the benefit of YMM17-31. Though I could be wrong about that. Info at https://software.intel.com/en-us/blogs/2013/avx-512-instructions

SAAA · Jul 14, 2014

On a even more serious note, who's writing those specs and why is he wasting is time doing so with no citations, even if they are false, because they are technical and not something like: hey performance will double! I could easily go there and write some things up too if that's the case, but I have something else to do than troll/speculate on a wiki. The caches are the most suspect part considering that I found mentions of a doubling also for Haswell but that didn't happen.

ShintaiDK · Jul 14, 2014

Never take the wiki as any form of fact.

NTMBK · Jul 14, 2014

SAAA said:
On a even more serious note, who's writing those specs and why is he wasting is time doing so with no citations, even if they are false, because they are technical and not something like: hey performance will double! I could easily go there and write some things up too if that's the case, but I have something else to do than troll/speculate on a wiki. The caches are the most suspect part considering that I found mentions of a doubling also for Haswell but that didn't happen.

Not only does it have no citations, it is from an anonymous user: http://en.wikipedia.org/w/index.php?title=Skylake_(microarchitecture)&diff=611182621&oldid=610263446

Homeles · Jul 14, 2014

I think they've mixed up EM64T and Enhanced Speed Step.

SAAA · Jul 14, 2014

The plot thickens... meanwhile I'm waiting for more "serious" leaks, say from VRzone, etc. xD

Sable · Jul 14, 2014

Exophase · Jul 14, 2014

The spec list is full of nonsense. At every layer of the cache hierarchy they claim significantly increased size and associativity yet significantly decreased latency (especially for L3 - 12MB at only 12 cycles!) It's highly dubious to say the least.

NostaSeronx · Jul 14, 2014

There is work by Intel, AMD, VIA and some others into reworking x86-64 into something competitive to ARM and OpenPower.

x87, MMX, SSE, SSE2-SSE4.2 are all not supported by this new x86-64. While all these instructions have equivilents in XOP to XOP2 and AVX to AVX3.2.

Other than that the only other change is 2-operand/3-operand destructive instructions in the generic cores are gone. Instead, replaced by 3-operand and 4-operand non-destructive. With plans for >5-operand non-destructive improvements.

The codename for this initiative is "Standard 64".

Exophase · Jul 14, 2014

NostaSeronx said:
x87, MMX, SSE, SSE2-SSE4.2 are all not supported by this new x86-64. While all these instructions have equivilents in XOP to XOP2 and AVX to AVX3.2.

Not 80-bit FP..

Homeles · Jul 14, 2014

Exophase said:
Not 80-bit FP..

Which needs to go the way of the dodo, no?

Exophase · Jul 14, 2014

Homeles said:
Which needs to go the way of the dodo, no?

I don't really know what the alternative should be, if there are applications where it really does matter. I have to assume so because I can't think of another reason why x87 is available in Xeon Phi.

Maybe some smattering of 128-bit instructions like fadd/fsub/fmul, that are invariable microcoded.

At any rate, what Seronx said is wrong, no 80-bit arithmetic equivalents in anything AVX (yet)

NTMBK · Jul 14, 2014

Seronx, stop making stuff up please.

NostaSeronx · Jul 14, 2014

Exophase said:
Not 80-bit FP..

Most 80-bit is rounded to 64-bit nowadays.

===
http://www.anandtech.com/show/1766/7

But keep in mind, this project is in its very early stages of research and as promising as this looks, it may take 5 - 10 years for the research to make its way into the real world.

http://www.xbitlabs.com/news/cpu/di...le_Threaded_Software_on_Multi_Core_Chips.html

===
Nothing is cancelled just merely delayed for another day.

NTMBK · Jul 14, 2014

NostaSeronx said:
Most 80-bit is rounded to 64-bit nowadays.

===
http://www.anandtech.com/show/1766/7

It is always truncated when it goes out of registers and into memory. This is partly why x87 sucks; you change an unrelated line of code, recompile, and change in register usage means your effective precision changes

And what on earth does Mitosis have to do with anything?

Exophase · Jul 14, 2014

NTMBK said:
It is always truncated when it goes out of registers and into memory. This is partly why x87 sucks; you change an unrelated line of code, recompile, and change in register usage means your effective precision changes

But it can also write 80-bit values to memory, and if you specify a datatype that uses this (like long double for typical C compilers) it won't truncate.

dmens · Jul 16, 2014

New programmer visible registers? Epic if true. First I heard of it.

New registers for microcode? That's old news.

TuxDave · Jul 16, 2014

Exophase said:
I don't really know what the alternative should be, if there are applications where it really does matter. I have to assume so because I can't think of another reason why x87 is available in Xeon Phi.

Maybe some smattering of 128-bit instructions like fadd/fsub/fmul, that are invariable microcoded.

At any rate, what Seronx said is wrong, no 80-bit arithmetic equivalents in anything AVX (yet)

I ask that question probably once a year. The most plausible hand-wavy answer that I ever got is that there are some legacy trig functions that still use x87 fp functions.

That doesn't really answer your question on "why is it on Xeon Phi" but ignoring the technical argument. On a project management standpoint you have to ask "how much area/power do you save vs the effort to remove the hardware and make a horribly long microcode sequence to do the whole thing. Is it worth doing this over something else you could be doing."

Exophase · Jul 16, 2014

TuxDave said:
I ask that question probably once a year. The most plausible hand-wavy answer that I ever got is that there are some legacy trig functions that still use x87 fp functions.

That doesn't really answer your question on "why is it on Xeon Phi" but ignoring the technical argument. On a project management standpoint you have to ask "how much area/power do you save vs the effort to remove the hardware and make a horribly long microcode sequence to do the whole thing. Is it worth doing this over something else you could be doing."

I can't really speak for legacy requirements, but there's definitely no technical merit in those x87 trig functions, they're way too slow. I bet you could even do 80-bit precise results faster in software routines, assuming these functions are even that accurate to begin with. Especially if you're doing it over vectors.

But there could be someone out there who intrisically benefits from 80-bit fadd/fsub/fmul, and not just because they don't want to change their software. I mean, tons of people need 64-bit over 32-bit, would it necessarily happen that going to the next nice power of two was good enough for anyone always? Someone also thought it was worth at least defining a 128-bit format too, even if no hardware really implements it.

With Xeon Phi, it isn't really a matter of what effort they would have needed to emulate it in microcode, because if no one benefited from it they could have just removed it entirely. I'd be surprised, shocked even if there was someone who legitimately wanted x87 instructions on Xeon Phi purely for compatibility reasons, and was also okay with a total lack of SSE (and MMX), as if people are porting 20 year old programs written in assembly to it. I also have a feeling they changed enough of the uarch from P5 or whatever that removing the hardware blocks would have been more of a benefit than a pain.

TuxDave · Jul 16, 2014

Exophase said:
I can't really speak for legacy requirements, but there's definitely no technical merit in those x87 trig functions, they're way too slow. I bet you could even do 80-bit precise results faster in software routines, assuming these functions are even that accurate to begin with. Especially if you're doing it over vectors.

No, I completely agree with you. I wish I could vaporize all x87 hardware too. I'm not a software guy in charge of performance tuning so I'm not sure how often these stupid functions come up or whether or not they even know they instantiated a math library that uses it and no one is ever going to update them.

I'd be surprised, shocked even if there was someone who legitimately wanted x87 instructions on Xeon Phi purely for compatibility reasons

How about "ain't nobody got time for that!" for a reason.

jhu · Jul 16, 2014

Exophase said:
I can't really speak for legacy requirements, but there's definitely no technical merit in those x87 trig functions, they're way too slow. I bet you could even do 80-bit precise results faster in software routines, assuming these functions are even that accurate to begin with. Especially if you're doing it over vectors.

Not sure who uses 80-bit FP nowadays because back when x87 was released, every other architecture was using 64-bit at most (although I did find some mention 4 years ago that NVidia's Phys-X libraries use x87 and not SSE2; unsure of its current status).

Exophase said:
With Xeon Phi, it isn't really a matter of what effort they would have needed to emulate it in microcode, because if no one benefited from it they could have just removed it entirely. I'd be surprised, shocked even if there was someone who legitimately wanted x87 instructions on Xeon Phi purely for compatibility reasons, and was also okay with a total lack of SSE (and MMX), as if people are porting 20 year old programs written in assembly to it. I also have a feeling they changed enough of the uarch from P5 or whatever that removing the hardware blocks would have been more of a benefit than a pain.

Perhaps the opposite is the truth: just tack on AVX-512 and move on to the next iteration (Knight's Landing) instead of spending more time on the current one. Then there are the people who don't want to use icc, but gcc doesn't support AVX-512, so at least those people can compile workable binaries for Xeon Phi using x87.

Homeles · Jul 16, 2014

TuxDave said:
I ask that question probably once a year. The most plausible hand-wavy answer that I ever got is that there are some legacy trig functions that still use x87 fp functions.

That doesn't really answer your question on "why is it on Xeon Phi" but ignoring the technical argument. On a project management standpoint you have to ask "how much area/power do you save vs the effort to remove the hardware and make a horribly long microcode sequence to do the whole thing. Is it worth doing this over something else you could be doing."

Wouldn't it be worth removing it? It may not be for a single generation, but when you consider just how many generations have had it... even if it were something silly like a penny shaved off of every chip... that's a ton of pennies.

Let VIA or AMD take the legacy x87 dinosaurs...

Exophase · Jul 17, 2014

jhu said:
Perhaps the opposite is the truth: just tack on AVX-512 and move on to the next iteration (Knight's Landing) instead of spending more time on the current one. Then there are the people who don't want to use icc, but gcc doesn't support AVX-512, so at least those people can compile workable binaries for Xeon Phi using x87.

People who purchase Xeon Phi but don't care about using the vector instruction set because they don't want to use ICC? Why would anyone bother? I don't think this scenario is credible.

It's not like Knight's Landing is getting all the SSE instruction sets Silvermont cores have so they don't have a problem removing stuff there...

What is EM64T enhanced mode?

Diamond Member

Lifer

Lifer

Senior member

Lifer

Lifer

Platinum Member

Senior member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Diamond Member