Speculating: nVIDIA CPU and x86 emulated Windows

dark zero · Jul 11, 2018

Taking the example of Vatilla series of speculations, I started to tought, nVIDIA will return this time for the emulated Windows 10, in order to compete against Qualcomm?

This article indicated that nVIDIA initially declined:
https://liliputing.com/2017/12/will-windows-10-run-nvidia-tegra-samsung-exynos-mediatek-chips.html

It was by late 2017, just months before Intel delives the big slam and announces their own GPU development.
https://www.techradar.com/news/intel-will-release-its-own-graphics-card-in-2020

And considering that AMD has already their own CPU and GPU departments which are competent that means very bad news for them since Intel might ditch nVIDIA GPU eventually and AMD won't use them for now.

Also considering that Qualcomm is still behind with their processors to compete and nVIDIA has already the Xavier processor available on the USD 1299 Jetson Xavier board:
https://www.youtube.com/watch?v=xaUA_sq4ZJo

It makes me to wonder if nVIDIA will seriously change their mind and approach Microsoft in order to develop their own laptops with FAR better GPU and decent CPU in order to make an space on there. There are already some decent games on ARM and are now increasing more and more.

What do you think about that? nVIDIA will change their mind before is too late?

NostaSeronx · Jul 11, 2018

Nvidia doesn't emulate. Nvidia translates and optimizes.

For all intents and purposes Nvidia needs the license. They will be getting that license soon™.

Translate => Simple Decoding (ARM license or x86 license)
Optimize => Soft OoO, re-arrange instructions to better suit the hardware.

NTMBK · Jul 11, 2018

I believe that NVidia are explicitly barred from emulating x86. It was part of their settlement with Intel. https://www.anandtech.com/show/4122/intel-settles-with-nvidia-more-money-fewer-problems-no-x86

Vattila · Jul 11, 2018

NTMBK said:
I believe that NVidia are explicitly barred from emulating x86. It was part of their settlement with Intel. https://www.anandtech.com/show/4122/intel-settles-with-nvidia-more-money-fewer-problems-no-x86

Here is the relevant passage from that article:

"NVIDIA also does not get an x86 license. x86 is among an umbrella group of what’s being called “Intel proprietary products” which NVIDIA is not getting access to. Intel’s flash memory holdings and other chipset holdings are also a part of this. Interestingly the agreement also classifies an “Intel Architecture Emulator” as being a proprietary product. At first glance this would seem to disallow NVIDIA from making an x86 emulator for any of their products, be it their GPU holdings or the newly announced Project Denver ARM CPU. Being officially prohibited from emulating x86 could be a huge deal for Denver down the road depending on where NVIDIA goes with it."

However, the agreement has now come to an end, and the article does not specify whether the x86 emulator restrictions were valid only for the duration of the agreement.

In any case, Nvidia focuses on high-margin markets and segments. So I doubt they will spearhead an attack in the x86 CPU market. They will leave that to the low-margin ARM players. If ARM can establish itself in the current x86 dominated markets (PC and server), then maybe Nvidia will enter the market with a high-margin CPU product.

I think the same strategy is at play at AMD. Lisa Su put their ARM efforts on ice, until there is a market there for a high-margin product.

Thala · Jul 11, 2018

dark zero said:
It makes me to wonder if nVIDIA will seriously change their mind and approach Microsoft in order to develop their own laptops with FAR better GPU and decent CPU in order to make an space on there. There are already some decent games on ARM and are now increasing more and more.

What do you think about that? nVIDIA will change their mind before is too late?

Would be interesting to compare Pascal vs. Adreno regarding efficiency. In the discrete GPU market Nvidia Pascal/Volta is the undisputed efficiency leader while Adreno is in the mobile market. I wonder how a scaled up Adreno compares to Pascal/Volta.

Thala · Jul 11, 2018

And maybe a small correction regarding "emulated Windows".
Windows in itself is never emulated. Windows on ARM is a full 64 bit native Windows for ARM, which supports 3 execution environments via 3 different sets of DLLs. 1) Native ARM64 2) ARM32 via WoW layer and 3) x86 via WoW and emulation layer. The last one uses emulation at application and user mode library layer but kernel and drivers are always native ARM64.

Accordingly you have the 3 folders on C: "program files", "program files(arm)" and "program files(x86)" as well as the 3 library folders: "C:\Windows\system32" (ARM64), "C:\Windows\SysARM32" (ARM32) and "C:\Windows\SysWOW64" (x86)

Shivansps · Jul 11, 2018

Personally, Microsoft needs to step up and actually "emulate" x64... with x86 they are just porting their already existing WoW x86 subsystem, but no one whats to use x86 anymore, so is not really that big deal to me anyway.

Belive me, i have a Windows BT tablet with a x86 uefi and so forced to run x86 Windows and you do feel the limitation. Im not sure i would buy a system without x64 support.

whm1974 · Jul 11, 2018

Shivansps said:
Personally, Microsoft needs to step up and actually "emulate" x64... with x86 they are just porting their already existing WoW x86 subsystem, but no one whats to use x86 anymore, so is not really that big deal to me anyway.

Belive me, i have a Windows BT tablet with a x86 uefi and so forced to run x86 Windows and you do feel the limitation. Im not sure i would buy a system without x64 support.

And why would you?

NTMBK · Jul 12, 2018

whm1974 said:
And why would you?

Windows on ARM doesn't emulate x64.

moinmoin · Jul 12, 2018

Is Nvidia interested in x86/x64 at all? There were rumors of them buying shares of Via at the beginning of this decade, but those turned out to be false and Nvidia continued to be content developing their Arm-based SoCs.

NTMBK · Jul 12, 2018

moinmoin said:
Is Nvidia interested in x86/x64 at all? There were rumors of them buying shares of Via at the beginning of this decade, but those turned out to be false and Nvidia continued to be content developing their Arm-based SoCs.

Apparently Denver was started when NVidia picked up a bunch of Stexar engineers, who previously worked on an x86 compatible chip: https://www.theinquirer.net/inquirer/news/1017162/nvidia-stexar-gun-turrets-amd-intel https://www.theinquirer.net/inquirer/news/1031103/nvidia-balls-circumvent-x86-licences Remember that Denver translates from ARM instructions into a totally different internal instruction set... supposedly it was originally meant to morph from x86 too, not just ARM.

moinmoin · Jul 12, 2018

NTMBK said:
Apparently Denver was started when NVidia picked up a bunch of Stexar engineers, who previously worked on an x86 compatible chip: https://www.theinquirer.net/inquirer/news/1017162/nvidia-stexar-gun-turrets-amd-intel https://www.theinquirer.net/inquirer/news/1031103/nvidia-balls-circumvent-x86-licences Remember that Denver translates from ARM instructions into a totally different internal instruction set... supposedly it was originally meant to morph from x86 too, not just ARM.

Ah yeah, good point. And that translation/recompilation all happens in software according to https://techreport.com/news/26906/nvidia-claims-haswell-class-performance-for-denver-cpu-core
Having much of the actual functionality done in software (at an essentially driver level) is a very Nvidia thing to do. I guess they indeed thought they could circumvent the necessity for x86 licenses and patents that way.

Nothingness · Jul 12, 2018

moinmoin said:
Ah yeah, good point. And that translation/recompilation all happens in software according to https://techreport.com/news/26906/nvidia-claims-haswell-class-performance-for-denver-cpu-core
Having much of the actual functionality done in software (at an essentially driver level) is a very Nvidia thing to do. I guess they indeed thought they could circumvent the necessity for x86 licenses and patents that way.

I might remember wrong but doesn't Denver also have ARM hardware decoders to help before recompilation is done?

NostaSeronx · Jul 12, 2018

Translation is hardware, even in Intel's version it is hardware.
Optimization is software, even in Intel's version it is software.

Optimization operates like LLC for the Re-order Buffer, Retire Queue:
Skylake => 224-entry
Zen => 192-entry
Denver => >1000-entry + Chained op windows

ARM to Internal ISA => 2 ARM mode ops
ARM to Internal ISA + Optimizer = 2 ARM mode ops + 7 native ops
Native ops go down the complex decoder.
ARM ops go down the simple ARM decoders.

Denver quickly moves from ARM mode(near-exact ARM<->Native interpretation) to Native mode(Loose ARM<->Native interpretation) which can utilize all registers 64 GPR and 64 FPU.
Optimizer does:
- Unrolls Loops
- Renames registers
- Reorders Loads and Stores
- Improves control flow
- Removes unused computation
- Hoists redundant computation
- Sinks uncommonly executed computation
- Improves scheduling

The major issue with Denver's design is that it isn't wide enough. Denver however is virtually Out-of-Order, which is a main benefit. Soft OoO is much safer than Hard OoO.

Intel's design the complex decoder is in the Global Front-end and the simple x86 decoders are localized in the cores. Complex decoder is called a horizontal decoder in SoftHV(Pre-VISC) and width is undetermined amount of ops. Simple decoders are called vertical decoders in SoftHV(pre-visc) and width is determined to be 1 op. A single logical thread can span several cores with different specifications: 2 Icelake class + 3 Tremont class cores. If you do the math, that particular design has 17 integer ALUs and 4 FMACs + 3 MACs(3MUL&3ADD). This design follows three paths: Direct VISC w/ x86 style encoding, Indirect VISC via recompiled x86 code, and Direct x86. There is also "Hot" and "Cold" path recompiliation; Hot goes Icelake always and Cold goes Tremont always.

As far as I know... Nvidia is the most likely to be selected for a ARM custom-esque partnership. Intel makes the standard VISC CPU and GPU cores, Nvidia makes custom CPU and GPU cores for VISC.

dark zero · Jul 15, 2018

And how about Xavier chip Nosta?
nVIDIA needs another iteration to make it wider and consume 15 watts?

I see this chip:

Hexa Core nVIDIA at 3.5 Ghz
12nm
20 watts
Using Volta GPU.

NostaSeronx · Jul 15, 2018

dark zero said:
And how about Xavier chip Nosta?
nVIDIA needs another iteration to make it wider and consume 15 watts?

I see this chip:

Hexa Core nVIDIA at 3.5 Ghz
12nm
20 watts
Using Volta GPU.

It will have to depend on the HotChips slides and if they detail what is in Carmel.
-> 10-wide, but that can also mean ARM mode decoders went from 2 to 4, and the complex decoder now decodes batches of 10-wide ops.
-> What exact units were increased in 10-wide. For example Nvidia could have pushed the branch unit into one of the ALUs. That would give 4-ALUs, 3-AGUs, 3-FPUs.
-> On the decode side, if they retain the 7-wide ops. There can be two complex decoders: double the native and ARM bandwidth from Denver.

Xavier as is 30W TDP and 20W SDP. Nvidia can simply port down to TSMC N7(+ specifically) and get ~200 mm squared and 12 watt TDP and 8 watt SDP.
Which retains the 8-core and 512-core CPU/GPU and accelerators(maybe?) design.

dark zero · Jul 15, 2018

NostaSeronx said:
It will have to depend on the HotChips slides and if they detail what is in Carmel.
-> 10-wide, but that can also mean ARM mode decoders went from 2 to 4, and the complex decoder now decodes batches of 10-wide ops.
-> What exact units were increased in 10-wide. For example Nvidia could have pushed the branch unit into one of the ALUs. That would give 4-ALUs, 3-AGUs, 3-FPUs.
-> On the decode side, if they retain the 7-wide ops. There can be two complex decoders: double the native and ARM bandwidth from Denver.

Xavier as is 30W TDP and 20W SDP. Nvidia can simply port down to TSMC N7(+ specifically) and get ~200 mm squared and 12 watt TDP and 8 watt SDP.
Which retains the 8-core and 512-core CPU/GPU and accelerators(maybe?) design.

Indeed, it might be possible, but I wanted to know if nVIDIA might be able to enter to the ARM race with their cores since they are capable to deliver a decent performance with them.

beginner99 · Jul 16, 2018

NV won't emulate x86/x64. In fact their strategy is pretty clear. They have 0 interest in tablets, phones, IoT etc...eg. all low margin markets. NVs strategy clearly is AI / Machine learning and providing hardware for training and predicting. Their SOCs are no mostly targeted at the automotive market where obviously AI is becoming more and more important due to driving assistants and ultimately self-driving cars. The later will be a huge market and if you are the first to get a foot in in, $$$$ because these will be luxury cars with high margins and later huge volume.

There is 0 reason for NV to invest in x86 or mature markets. That would simply need too much effort /money which can be used for better opportunities.

NTMBK · Jul 16, 2018

beginner99 said:
NV won't emulate x86/x64. In fact their strategy is pretty clear. They have 0 interest in tablets, phones, IoT etc...eg. all low margin markets. NVs strategy clearly is AI / Machine learning and providing hardware for training and predicting. Their SOCs are no mostly targeted at the automotive market where obviously AI is becoming more and more important due to driving assistants and ultimately self-driving cars. The later will be a huge market and if you are the first to get a foot in in, $$$$ because these will be luxury cars with high margins and later huge volume.

There is 0 reason for NV to invest in x86 or mature markets. That would simply need too much effort /money which can be used for better opportunities.

Their AI/Machine learning systems still need a host CPU. Every time they build one of their DGX-2 compute boxes, they need to pay Intel $12000 for the Xeons to go in it. Theoretically NVidia could make a host CPU which included NVLink, and integrated much more tightly with the GPU cluster.

Dribble · Jul 16, 2018

NTMBK said:
Their AI/Machine learning systems still need a host CPU. Every time they build one of their DGX-2 compute boxes, they need to pay Intel $12000 for the Xeons to go in it. Theoretically NVidia could make a host CPU which included NVLink, and integrated much more tightly with the GPU cluster.

Super computers are x86 + gpu, but smaller AI stuff (Nvidia Drive) uses ARM.

Nothingness · Jul 16, 2018

NTMBK said:
Their AI/Machine learning systems still need a host CPU. Every time they build one of their DGX-2 compute boxes, they need to pay Intel $12000 for the Xeons to go in it. Theoretically NVidia could make a host CPU which included NVLink, and integrated much more tightly with the GPU cluster.

Couldn't they use POWER? It has NVLink.

https://www.nextplatform.com/2017/12/15/nvlink-shines-power9-ai-hpc-tests/

NTMBK · Jul 16, 2018

Nothingness said:
Couldn't they use POWER? It has NVLink.

https://www.nextplatform.com/2017/12/15/nvlink-shines-power9-ai-hpc-tests/

They could, but a bunch of software won't run on POWER. And they'll still be giving a big chunk o' change to IBM.

beginner99 · Jul 17, 2018

NTMBK said:
Their AI/Machine learning systems still need a host CPU. Every time they build one of their DGX-2 compute boxes, they need to pay Intel $12000 for the Xeons to go in it. Theoretically NVidia could make a host CPU which included NVLink, and integrated much more tightly with the GPU cluster.

Well it's the customer that ultimately pays for the CPU. And in that price range whether you pay 12k or 6k for the CPU doesn't matter much.

The main point is there is no need to use a x86 CPU here. The could use as someone said a power9 CPU, an ARM CPU (Cavium, Centriq, their own). I never said they don't need a CPU but they don't need an x86 CPU.

CatMerc · Jul 17, 2018

Thala said:
Would be interesting to compare Pascal vs. Adreno regarding efficiency. In the discrete GPU market Nvidia Pascal/Volta is the undisputed efficiency leader while Adreno is in the mobile market. I wonder how a scaled up Adreno compares to Pascal/Volta.

I imagine Adreno would win. It's simply not built to handle the wide swath of compute tasks that Pascal does, and that probably costs majorly in both power and area.

The area efficiency on Adreno is a sight to behold to be honest. The gap is just incredible. AMD shouldn't have kicked out that talent in 2012/13.

NTMBK · Jul 17, 2018

CatMerc said:
The area efficiency on Adreno is a sight to behold to be honest. The gap is just incredible. AMD shouldn't have kicked out that talent in 2012/13.

I don't think AMD had a choice, they needed the money.

Speculating: nVIDIA CPU and x86 emulated Windows

Platinum Member

Diamond Member

Lifer

Senior member

Golden Member

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Diamond Member

Golden Member

Lifer