What is EM64T enhanced mode?

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
I saw this listed on the Wikipedia page for Skylake:
http://en.wikipedia.org/wiki/Skylake_(microarchitecture)#Architecture

Integer register increased from 16 in x64 standard to 32 in EM64T enhanced mode (r16–r31 for enhanced mode only, while it can be used for memory segment in normal x64 ISA)

I can't find any direct quotes from Intel about this, so it may just be a BS rumor, but it's made a spec list that's circulating the Internet.
Changing the ISA in this way seems like a big deal, and it seems unlikely Intel would just double register count and not make a big deal out of it. If so, this could be a good performance boost for applications compiled to take advantage of it, but would be completely incompatible with older processors. Also, bringing memory segments back just seems weird.

Are there any additional details about this?
 

NTMBK

Lifer
Nov 14, 2011
10,454
5,841
136
On a more serious note, they might be getting this confused with the increased vector register count. AVX-512 increases the number of vector registers from 16 to 32 (as well as increasing the width to 512 bits), and I believe that AVX and AVX2 operations can still operate on the lower 128/256 bits of these registers. So AVX/2 code recompiled for Skylake could potentially get the benefit of YMM17-31. Though I could be wrong about that. Info at https://software.intel.com/en-us/blogs/2013/avx-512-instructions
 

SAAA

Senior member
May 14, 2014
541
126
116
On a even more serious note, who's writing those specs and why is he wasting is time doing so with no citations, even if they are false, because they are technical and not something like: hey performance will double! I could easily go there and write some things up too if that's the case, but I have something else to do than troll/speculate on a wiki. The caches are the most suspect part considering that I found mentions of a doubling also for Haswell but that didn't happen.
 

NTMBK

Lifer
Nov 14, 2011
10,454
5,841
136
On a even more serious note, who's writing those specs and why is he wasting is time doing so with no citations, even if they are false, because they are technical and not something like: hey performance will double! I could easily go there and write some things up too if that's the case, but I have something else to do than troll/speculate on a wiki. The caches are the most suspect part considering that I found mentions of a doubling also for Haswell but that didn't happen.

Not only does it have no citations, it is from an anonymous user: http://en.wikipedia.org/w/index.php?title=Skylake_(microarchitecture)&diff=611182621&oldid=610263446
 

SAAA

Senior member
May 14, 2014
541
126
116
The plot thickens... meanwhile I'm waiting for more "serious" leaks, say from VRzone, etc. xD
 

Sable

Golden Member
Jan 7, 2006
1,130
105
106
UsrGdWindws6-7.gif
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
The spec list is full of nonsense. At every layer of the cache hierarchy they claim significantly increased size and associativity yet significantly decreased latency (especially for L3 - 12MB at only 12 cycles!) It's highly dubious to say the least.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
There is work by Intel, AMD, VIA and some others into reworking x86-64 into something competitive to ARM and OpenPower.

x87, MMX, SSE, SSE2-SSE4.2 are all not supported by this new x86-64. While all these instructions have equivilents in XOP to XOP2 and AVX to AVX3.2.

Other than that the only other change is 2-operand/3-operand destructive instructions in the generic cores are gone. Instead, replaced by 3-operand and 4-operand non-destructive. With plans for >5-operand non-destructive improvements.

The codename for this initiative is "Standard 64".
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Which needs to go the way of the dodo, no?

I don't really know what the alternative should be, if there are applications where it really does matter. I have to assume so because I can't think of another reason why x87 is available in Xeon Phi.

Maybe some smattering of 128-bit instructions like fadd/fsub/fmul, that are invariable microcoded.

At any rate, what Seronx said is wrong, no 80-bit arithmetic equivalents in anything AVX (yet)
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
It is always truncated when it goes out of registers and into memory. This is partly why x87 sucks; you change an unrelated line of code, recompile, and change in register usage means your effective precision changes

But it can also write 80-bit values to memory, and if you specify a datatype that uses this (like long double for typical C compilers) it won't truncate.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
New programmer visible registers? Epic if true. First I heard of it.

New registers for microcode? That's old news.
 
Last edited:

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
I don't really know what the alternative should be, if there are applications where it really does matter. I have to assume so because I can't think of another reason why x87 is available in Xeon Phi.

Maybe some smattering of 128-bit instructions like fadd/fsub/fmul, that are invariable microcoded.

At any rate, what Seronx said is wrong, no 80-bit arithmetic equivalents in anything AVX (yet)

I ask that question probably once a year. The most plausible hand-wavy answer that I ever got is that there are some legacy trig functions that still use x87 fp functions.

That doesn't really answer your question on "why is it on Xeon Phi" but ignoring the technical argument. On a project management standpoint you have to ask "how much area/power do you save vs the effort to remove the hardware and make a horribly long microcode sequence to do the whole thing. Is it worth doing this over something else you could be doing."
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I ask that question probably once a year. The most plausible hand-wavy answer that I ever got is that there are some legacy trig functions that still use x87 fp functions.

That doesn't really answer your question on "why is it on Xeon Phi" but ignoring the technical argument. On a project management standpoint you have to ask "how much area/power do you save vs the effort to remove the hardware and make a horribly long microcode sequence to do the whole thing. Is it worth doing this over something else you could be doing."

I can't really speak for legacy requirements, but there's definitely no technical merit in those x87 trig functions, they're way too slow. I bet you could even do 80-bit precise results faster in software routines, assuming these functions are even that accurate to begin with. Especially if you're doing it over vectors.

But there could be someone out there who intrisically benefits from 80-bit fadd/fsub/fmul, and not just because they don't want to change their software. I mean, tons of people need 64-bit over 32-bit, would it necessarily happen that going to the next nice power of two was good enough for anyone always? Someone also thought it was worth at least defining a 128-bit format too, even if no hardware really implements it.

With Xeon Phi, it isn't really a matter of what effort they would have needed to emulate it in microcode, because if no one benefited from it they could have just removed it entirely. I'd be surprised, shocked even if there was someone who legitimately wanted x87 instructions on Xeon Phi purely for compatibility reasons, and was also okay with a total lack of SSE (and MMX), as if people are porting 20 year old programs written in assembly to it. I also have a feeling they changed enough of the uarch from P5 or whatever that removing the hardware blocks would have been more of a benefit than a pain.
 
Last edited:

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
I can't really speak for legacy requirements, but there's definitely no technical merit in those x87 trig functions, they're way too slow. I bet you could even do 80-bit precise results faster in software routines, assuming these functions are even that accurate to begin with. Especially if you're doing it over vectors.

No, I completely agree with you. I wish I could vaporize all x87 hardware too. I'm not a software guy in charge of performance tuning so I'm not sure how often these stupid functions come up or whether or not they even know they instantiated a math library that uses it and no one is ever going to update them. :)

I'd be surprised, shocked even if there was someone who legitimately wanted x87 instructions on Xeon Phi purely for compatibility reasons

How about "ain't nobody got time for that!" for a reason. :p
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
I can't really speak for legacy requirements, but there's definitely no technical merit in those x87 trig functions, they're way too slow. I bet you could even do 80-bit precise results faster in software routines, assuming these functions are even that accurate to begin with. Especially if you're doing it over vectors.

Not sure who uses 80-bit FP nowadays because back when x87 was released, every other architecture was using 64-bit at most (although I did find some mention 4 years ago that NVidia's Phys-X libraries use x87 and not SSE2; unsure of its current status).

With Xeon Phi, it isn't really a matter of what effort they would have needed to emulate it in microcode, because if no one benefited from it they could have just removed it entirely. I'd be surprised, shocked even if there was someone who legitimately wanted x87 instructions on Xeon Phi purely for compatibility reasons, and was also okay with a total lack of SSE (and MMX), as if people are porting 20 year old programs written in assembly to it. I also have a feeling they changed enough of the uarch from P5 or whatever that removing the hardware blocks would have been more of a benefit than a pain.

Perhaps the opposite is the truth: just tack on AVX-512 and move on to the next iteration (Knight's Landing) instead of spending more time on the current one. Then there are the people who don't want to use icc, but gcc doesn't support AVX-512, so at least those people can compile workable binaries for Xeon Phi using x87.
 
Last edited:

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
I ask that question probably once a year. The most plausible hand-wavy answer that I ever got is that there are some legacy trig functions that still use x87 fp functions.

That doesn't really answer your question on "why is it on Xeon Phi" but ignoring the technical argument. On a project management standpoint you have to ask "how much area/power do you save vs the effort to remove the hardware and make a horribly long microcode sequence to do the whole thing. Is it worth doing this over something else you could be doing."
Wouldn't it be worth removing it? It may not be for a single generation, but when you consider just how many generations have had it... even if it were something silly like a penny shaved off of every chip... that's a ton of pennies.

Let VIA or AMD take the legacy x87 dinosaurs...
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Perhaps the opposite is the truth: just tack on AVX-512 and move on to the next iteration (Knight's Landing) instead of spending more time on the current one. Then there are the people who don't want to use icc, but gcc doesn't support AVX-512, so at least those people can compile workable binaries for Xeon Phi using x87.

People who purchase Xeon Phi but don't care about using the vector instruction set because they don't want to use ICC? Why would anyone bother? I don't think this scenario is credible.

It's not like Knight's Landing is getting all the SSE instruction sets Silvermont cores have so they don't have a problem removing stuff there...