[TT]NVIDIA Tegra 4 processor details leaked

Tuna-Fish · Dec 19, 2012

-Slacker- said:
*Rubs forehead in homicidal frustration*

And are all shaders equivalent to one another? Are the shaders on a kepler based gpu the same as the ones on fermi?

Nope. A different kind of GPU computing unit can get work done at a different rate. However, once you have a design that you like, you can effectively just tile as many as you think you can fit in your silicon/power budget. Based on the data, I though that they were just replicating the old gpu cluster a few times. If it truly is something kepler-derived, then I was wrong.

Is a gtx 680 3 times more powerful than a gtx 580?

It's important to note that one doubling in the shader count from Fermi to Kepler was that they got rid of the fast clock and replaced it with more units. In both Fermi and Kepler, in a single base gpu clock, a single instruction submitted to a SM(X) completes 32 alu ops. In Fermi, it does so by driving 16 units at 2x the speed of the base clock, in Kepler they just have 32 units at the same speed as the base clock. So that's a huge part of the the perceived performance disparity between Kepler and Fermi and their shader unit counts.

ShintaiDK · Dec 19, 2012

-Slacker- said:
*Rubs forehead in homicidal frustration*

And are all shaders equivalent to one another? Are the shaders on a kepler based gpu the same as the ones on fermi? Is a gtx 680 3 times more powerful than a gtx 580?

They are different. Shaders on the GTX580 is also clock doubled btw.

f1sherman · Dec 19, 2012

Fewer thingies oscillating at higher frequencies are always more prone to power leakage, than more of the thingies oscillating at lower frequencies.

Hot clock allowed for die area savings but it was merciless to TDP.

BenSkywalker · Dec 19, 2012

If they were Kepler cores, I'd assume we'd get >6x the performance. But yes, perhaps nVidia is being super conservative. Source?

It was widely discussed for months, but I guess people don't pay attention much-

http://www.extremetech.com/computin...a-could-revolutionize-smartphone-capabilities

Random link from the likely thousands I could dig up. That doesn't spell it out in a concrete manner, so let's approach this from an entirely different angle, as if I couldn't find a single link.

We now have Windows 8 RT products shipping and nVidia is going to want to leverage their edge in Windows device experience, say what you will about JHH stupid isn't included. Their current SoCs lack CUDA support, that means they are pre G80 cores. To improve the graphics cores for Tegra look at the options nV has at their disposal- G80- not a bad part, but a bit of a power hog compared to the G70 derivatives when on the same build process. GT2x series were even worse here having very large portions of die dedicated to GPGPU with quite a bit of die space dedicated to DP functions- certainly not an ideal solution for a 1 watt device. Fermi, yeah, we don't need to go there. That leaves Kepler for a fully developed part that they could use.

There are a lot of comments that have come from nVidia telling us that Wayne was going to use Kepler, but in all honesty it wouldn't be necessary, if you apply the slightest thought to the situation then you are going to come up with the only reasonable solution. One of the things I think we can all agree upon is that nVidia is exceptionally good at is leveraging their R&D to benefit multiple target segments. Kepler has already proven to be a solid architecture for consumer graphics, pro graphics and GPGPU up to and including the fastest supercomputer in the world. The biggest difference between it and Fermi isn't its' flexibility however, it is the greatly increased performance/watt that it offers. Saying nVidia is going to use Kepler cores for Tegra 4 is simply acknowledging that they aren't complete idiots

I believe it when I see the actual results. Until then its NV SoC hypeeeeee as usual.

Which part didn't live up to the hype? Outside of GLBenchmark Tegra 2 and Tegra 3 have pretty much wiped the floor with the competition upon release. Qualcomm and to an even greater degree Samsung has done a good job of countering, but their parts tend to come quarters later. I've got all of the flavors of the latest SoCs here outside of the S4 Pro with the Adreno 320, and the Tegra 3 isn't looking bad against any of them, including some that came close to a year later. No, they haven't wiped out the SoC industry yet the way they did most of the graphics industry, but it took them six or seven rounds before the GeForce on the graphics side too.

BenSkywalker · Dec 19, 2012

Yeah, I'm double posting, another point I wanted to make that actually makes the six fold increase in units only being reported as linear make perfect sense.

The 7800GTX had 32 shader units(split 24/8), the GTX *650* has 384. They are in *no* way comparable, and we don't know how the split out is handled, but talking about the core having six times the cores really doesn't tell us much of anything in and of itself.

Arzachel · Dec 19, 2012

BenSkywalker said:
Which part didn't live up to the hype? Outside of GLBenchmark Tegra 2 and Tegra 3 have pretty much wiped the floor with the competition upon release. Qualcomm and to an even greater degree Samsung has done a good job of countering, but their parts tend to come quarters later. I've got all of the flavors of the latest SoCs here outside of the S4 Pro with the Adreno 320, and the Tegra 3 isn't looking bad against any of them, including some that came close to a year later. No, they haven't wiped out the SoC industry yet the way they did most of the graphics industry, but it took them six or seven rounds before the GeForce on the graphics side too.

I'm pretty slow today but what exactly do you mean with "Outside of GLBenchmark"? That Tegra SoCs perform poorly at graphics workloads or is that a GLBencmark-specific issue?

Anyways, I really can't recall Tegra 2 being all that great, low clocks, lack of NEON and poor memory bandwith didn't really help it. Tegra 3 was better, but I'm still not too keen on quad cores on smartphone SoCs. GPU performance seemed underwhelming for both, but then again GLBenchmark is my only data point.

That said, it's nice that Nvidia will be beefing up the memory bandwith and more GPU performance is always appreciated.

BenSkywalker · Dec 19, 2012

I'm pretty slow today but what exactly do you mean with "Outside of GLBenchmark"? That Tegra SoCs perform poorly at graphics workloads or is that a GLBencmark-specific issue?

GLBench looks very good performance wise on PowerVR parts and only PowerVR parts in relative terms. If you look at the quality of graphics being displayed versus the performance it is *terrible*. The visual quality is *far* below what we are seeing in mobile games for some time now, and it is *much* slower. Given both of these things, it is very easy to make both ends of this true by messing with sort orders(or not utilzing them properly). GLBench is just a trash piece of software overall.

Anyways, I really can't recall Tegra 2 being all that great, low clocks, lack of NEON and poor memory bandwith didn't really help it.

http://www.anandtech.com/show/4165/the-motorola-atrix-4g-preview/5

T2 was clocked as fast as the single core processors that were out when it launched, it was the first dual core SoC to hit the market in the UP consumer space. Performance wise it smashed almost everything else, and that was using single core benches(the difference obviously grew once we got multi threaded benches). T3 blew almost everything else away when it launched in pretty much every bench. Lack of NEON was an issue for Tegra 2, bandwidth I never saw any actual issues with- I still have T2 devices kicking around if you could recall what types of problems this may cause?

Saylick · Dec 19, 2012

BenSkywalker said:
It was widely discussed for months, but I guess people don't pay attention much-

http://www.extremetech.com/computin...a-could-revolutionize-smartphone-capabilities

Random link from the likely thousands I could dig up. That doesn't spell it out in a concrete manner, so let's approach this from an entirely different angle, as if I couldn't find a single link.

We now have Windows 8 RT products shipping and nVidia is going to want to leverage their edge in Windows device experience, say what you will about JHH stupid isn't included. Their current SoCs lack CUDA support, that means they are pre G80 cores. To improve the graphics cores for Tegra look at the options nV has at their disposal- G80- not a bad part, but a bit of a power hog compared to the G70 derivatives when on the same build process. GT2x series were even worse here having very large portions of die dedicated to GPGPU with quite a bit of die space dedicated to DP functions- certainly not an ideal solution for a 1 watt device. Fermi, yeah, we don't need to go there. That leaves Kepler for a fully developed part that they could use.

There are a lot of comments that have come from nVidia telling us that Wayne was going to use Kepler, but in all honesty it wouldn't be necessary, if you apply the slightest thought to the situation then you are going to come up with the only reasonable solution. One of the things I think we can all agree upon is that nVidia is exceptionally good at is leveraging their R&D to benefit multiple target segments. Kepler has already proven to be a solid architecture for consumer graphics, pro graphics and GPGPU up to and including the fastest supercomputer in the world. The biggest difference between it and Fermi isn't its' flexibility however, it is the greatly increased performance/watt that it offers. Saying nVidia is going to use Kepler cores for Tegra 4 is simply acknowledging that they aren't complete idiots

Sorry, I don't do a lot of digging in the mobile space but that makes sense.

Red Hawk · Dec 19, 2012

BenSkywalker said:
Actually, outside of memory bandwidth they do, they do precisely that. A graphics core is a set combination of ROPs and shader units(ROP is a raster op pipeline that handles texture/raster ops). The interesting part about Wayne is that it is Kepler core versus G70 based derivatives in the older Tegra parts, it is possible nV is being rather conservative with their 6x estimate(although, real world performance could well be in line with that even if theoreticals put it at closer to 10x).

You're not using "graphics core" in the same sense as Nvidia is in the slide. They are calling each shader unit a "core", not the whole chip (because MOAR CORES). A shader unit does not include a ROP, rasterizer, texture unit, etc. Simply increasing shader units, even if it's the same architecture, does not guarantee a linear increase in effective processing power.

It's entirely possible that Nvidia did scale up the other components of the graphics chip, or that the architecture change helped secure a 6x increase in effective processing power. But simply scaling up the shader units does not prove this.

Jstn7477 · Dec 19, 2012

Bought a Nexus 7 last week and am very happy with Tegra 3's performance, so Tegra 4 can only be better I suppose. So glad I upgraded from one of those no-name Allwinner A10 based tablets that took practically 2 minutes to do anything.

BenSkywalker · Dec 20, 2012

You're not using "graphics core" in the same sense as Nvidia is in the slide.

Do you work for nVidia? Curious as to where you got that piece of information from. I already stated in a follow up post-

talking about the core having six times the cores really doesn't tell us much of anything in and of itself.

nVidia uses the 'core' designation on their chips normally to refer to a CUDA core, a block of some combination of ROPs, shader units and cache.

They are calling each shader unit a "core", not the whole chip (because MOAR CORES).

Tegra parts to date have had each shader core tied to a ROP- it was always a 1/1 correlation so using different designations wasn't needed. That isn't necessarily staying the same.

Acanthus · Dec 20, 2012

Do we have an apples to apples comparison of the IPC increase between A9 and A15? I'd be interested in seeing how they perform at the same clock with similar memory.

Saylick · Dec 20, 2012

Acanthus said:
Do we have an apples to apples comparison of the IPC increase between A9 and A15? I'd be interested in seeing how they perform at the same clock with similar memory.

Not that I am aware of, but rumors estimate at ~40% faster.

Erenhardt · Dec 20, 2012

Mobiles are going forward fast... but somehow I don't like that

Red Hawk · Dec 20, 2012

BenSkywalker said:
Do you work for nVidia? Curious as to where you got that piece of information from. I already stated in a follow up post-

nVidia uses the 'core' designation on their chips normally to refer to a CUDA core, a block of some combination of ROPs, shader units and cache.

Tegra parts to date have had each shader core tied to a ROP- it was always a 1/1 correlation so using different designations wasn't needed. That isn't necessarily staying the same.

I don't work for Nvidia; do you? I was simply using common sense. If "core" included ROPs that would leave us with a ridiculous amount of ROPs. The most powerful desktop flagship GPUs only have 32 ROPs.

The Geforce GTX 680 and 670 have different amounts of shader units, yet the same amount of ROPs. I'm not seeing how an increase in shader units necessitates an increase in ROPs. And this is still leaving out elements like rasterizers, texture units, geometry engines, etc.

BenSkywalker · Dec 21, 2012

I don't work for Nvidia; do you?

Nope, no clue what layout they are going to use.

If "core" included ROPs that would leave us with a ridiculous amount of ROPs.

Ridiculous amount of ROPs, heh, so the most popular resolution for PCs by an overwhelming majority is 1920x1080, a 2MP display. In the tablet space, where nVidia tends to be strongest in the SoC market, that is *below* the most popular devices by quite a bit. The PC market in general has been extremely resistant to move to higher resolution displays and as such, ROP demands haven't gone up nearly as quickly as shader requirements. The Ultra Portable market is in a *very* different situation. Right now the top three 10" tablets, two models of iPad and the Nexus 10 have 3MP displays or higher, 50% or more over the PC standard. By comparsion, I'd be shocked if they had 10% of the demands in terms of shader performance.

Is that to say I expect an SoC to have anywhere near the fillrate of a PC GPU? Of course not, but I wouldn't be in the least bit surprised if the *ratio* tips rather heavily in the direction of ROPs versus what we see in desktops.

Red Hawk · Dec 22, 2012

BenSkywalker said:
Nope, no clue what layout they are going to use.

Ridiculous amount of ROPs, heh, so the most popular resolution for PCs by an overwhelming majority is 1920x1080, a 2MP display. In the tablet space, where nVidia tends to be strongest in the SoC market, that is *below* the most popular devices by quite a bit. The PC market in general has been extremely resistant to move to higher resolution displays and as such, ROP demands haven't gone up nearly as quickly as shader requirements. The Ultra Portable market is in a *very* different situation. Right now the top three 10" tablets, two models of iPad and the Nexus 10 have 3MP displays or higher, 50% or more over the PC standard. By comparsion, I'd be shocked if they had 10% of the demands in terms of shader performance.

Is that to say I expect an SoC to have anywhere near the fillrate of a PC GPU? Of course not, but I wouldn't be in the least bit surprised if the *ratio* tips rather heavily in the direction of ROPs versus what we see in desktops.

Perhaps you're right about the usage of ROPs in mobile chips, and if I had compared Tegra 4 to GK107 then perhaps I would concede the point. But I'm not comparing it to GK107. If each "core" included a ROP, that wouldn't be the same in comparison to Nvidia's low-end, mid-range, or even high-end cards. It would be over twice the amount of ROPs in the high-end cards.

Anyways, further evidence that "core" does not refer to a package including ROPs and shader units: The official pages for the GTX 680 and the GTX 670 list different amounts of "CUDA Cores" for each (1344 for the 670, 1536 for the 68). Yet they have the same amount of ROPs. Regardless of the typical ratio of ROPs to shader units used in Tegra, simply increasing the amount of "cores" 6 times does not prove an identical increase in practical performance.

tviceman · Dec 22, 2012

Red Hawk said:
simply increasing the amount of "cores" 6 times does not prove an identical increase in practical performance.

The core count is up by a factor of six vs. Tegra 3, and a factor of 9 vs. Tegra 2. The slide says it is 20 times faster than Tegra 2. You could be making the exact same argument for a tegra 2 vs. tegra 4 comparison - that simply increasing the cores by a factor of 9 does not prove a 20 fold increase in preformance. Indeed there is other stuff going on to account for the performance increase over tegra 2 and 3 - like GPU clocks, the IPC of each core, memory bandwidth, amount of ROPs. etc.

The fact that nvidia is comparing their new chip to their own old chips (as opposed to comparing their stuff to a competitor's chips) makes the claims hold more water. Is tegra 4 going to be 6x faster in graphics across the board vs. Tegra 3? Probably not. But it will likely hit that scenario (or come close to it) in situations that fully stress the GPU.

BenSkywalker · Dec 22, 2012

If each "core" included a ROP, that wouldn't be the same in comparison to Nvidia's low-end, mid-range, or even high-end cards. It would be over twice the amount of ROPs in the high-end cards.

That

Is

How

Every

Tegra

Chip

Ever

Made

Has

Been

You talk as if what I'm saying is insane- there are tens of millions of Tegra devices already on the market and *EVERY SINGLE ONE OF THEM* has a shader per ROP.

You are stating as a point of fact that they are going to change them, I asked if you worked at nVidia as I have already noted they may well be changing them- but nothing we have seen to date says that they are. The fact that UP SoCs need far more fillrate then shader power- particularly when compared to the hyper low pixel density displays used for PCs- if anything would indicate that they may be better off with one shader for every two ROPs then the other way around.

Is it realistic that nVidia would make the *ratio* of ROPs to pixel shaders much different on their UP SoCs then their desktop parts- considering they *ALWAYS* have done *EXACTLY* that, yes, I would say it is reasonable. I have not said it *WILL* happen, but you are talking like it is absurd when we have tens of millions of devices already made that indicate nVidia thinks you are wrong. Maybe they will change, but they would need to for your assertions to be correct.

Acanthus · Dec 22, 2012

Saylick said:
Not that I am aware of, but rumors estimate at ~40% faster.

According to that article, they quote 25-30%, and that is with a 300mhz clockspeed advantage.

That is still a decent increase in IPC for a single generation though.

Maybe i will have to phone shop next year after all.

ShintaiDK · Dec 22, 2012

1 ROP doesnt equal 1 ROP does it.

People seems to think too simplistic. One shader doesnt equal one shader either. Or 1 flop isnt equal to 1 flop either. (GPU vs CPU in mind.)

thilanliyan · Dec 22, 2012

BenSkywalker said:
Ridiculous amount of ROPs, heh, so the most popular resolution for PCs by an overwhelming majority is 1920x1080, a 2MP display. In the tablet space, where nVidia tends to be strongest in the SoC market, that is *below* the most popular devices by quite a bit.

The majority of tablets don't have a resolution above 1920x1080. Only recently with the iPad3 (or 4, can't remember) and the Nexus 10 has that changed, but those devices are not the majority of tablets out there.

Red Hawk · Dec 22, 2012

BenSkywalker said:
That

Is

How

Every

Tegra

Chip

Ever

Made

Has

Been

You talk as if what I'm saying is insane- there are tens of millions of Tegra devices already on the market and *EVERY SINGLE ONE OF THEM* has a shader per ROP.

You are stating as a point of fact that they are going to change them, I asked if you worked at nVidia as I have already noted they may well be changing them- but nothing we have seen to date says that they are. The fact that UP SoCs need far more fillrate then shader power- particularly when compared to the hyper low pixel density displays used for PCs- if anything would indicate that they may be better off with one shader for every two ROPs then the other way around.

Is it realistic that nVidia would make the *ratio* of ROPs to pixel shaders much different on their UP SoCs then their desktop parts- considering they *ALWAYS* have done *EXACTLY* that, yes, I would say it is reasonable. I have not said it *WILL* happen, but you are talking like it is absurd when we have tens of millions of devices already made that indicate nVidia thinks you are wrong. Maybe they will change, but they would need to for your assertions to be correct.

If you're so certain about this, could you provide a source describing the Tegra architecture this way?

Saylick · Dec 22, 2012

Acanthus said:
According to that article, they quote 25-30%, and that is with a 300mhz clockspeed advantage.

That is still a decent increase in IPC for a single generation though.

Maybe i will have to phone shop next year after all.

I was basing it off the 3.5 DMIPS/MHz rumor for A15 vs. 2.5 DMIPS/Mhz for A9, which is a 40% increase.

[TT]NVIDIA Tegra 4 processor details leaked

Golden Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Diamond Member